[Numpy-discussion] Numpy Array of dtype=object with strings and floats question

Darryl Wallace darryl.wallace@prosensus...
Tue Nov 10 14:08:27 CST 2009


Thanks for the help,

I'll test out this simple example.

On Tue, Nov 10, 2009 at 2:28 PM, Keith Goodman <kwgoodman@gmail.com> wrote:

> On Tue, Nov 10, 2009 at 11:14 AM, Keith Goodman <kwgoodman@gmail.com>
> wrote:
> > On Tue, Nov 10, 2009 at 10:53 AM, Darryl Wallace
> > <darryl.wallace@prosensus.ca> wrote:
> >> I currently do as you suggested.  But when the dataset size becomes
> large,
> >> it gets to be quite slow due to the overhead of python looping.
> >
> > Are you using a for loop? Is so, something like this might be faster:
> >
> >>> x = [1, 2, '', 3, 4, 'String']
> >>> from numpy import nan
> >>> [(z, nan)[type(z) is str] for z in x]
> >   [1, 2, nan, 3, 4, nan]
> >
> > I use something similar in my code, so I'm interested to see if anyone
> > can speed things up using python or numpy, or both. I run it on each
> > row of the file replacing '' with None. Here's the benchmark code:
> >
> >>> x = [1, 2, '', 4, 5, '', 7, 8, 9, 10]
> >>> timeit [(z, None)[z is ''] for z in x]
> > 100000 loops, best of 3: 2.32 µs per loop
>
> If there are few missing values (my use case), this seems to be faster:
>
> def myfunc(x):
>    while '' in x:
>        x[x.index('')] = None
>    return x
>
> >> timeit myfunc(x)
> 1000000 loops, best of 3: 697 ns per loop
>
> Note that it works inplace.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
______________________________________
Darryl Wallace: Project Leader
ProSensus Inc.
McMaster Innovation Park
175 Longwood Road South, Suite 301
Hamilton, Ontario, L8P 0A1
Canada        (GMT -05:00)

Tel:       1-905-528-9136
Fax:       1-905-546-1372

Web site:  http://www.prosensus.ca/
______________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20091110/20bf7f74/attachment.html 


More information about the NumPy-Discussion mailing list