[Numpy-discussion] Numpy Array of dtype=object with strings and floats question
Darryl Wallace
darryl.wallace@prosensus...
Tue Nov 10 14:08:27 CST 2009
Thanks for the help,
I'll test out this simple example.
On Tue, Nov 10, 2009 at 2:28 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Tue, Nov 10, 2009 at 11:14 AM, Keith Goodman <kwgoodman@gmail.com>
> wrote:
> > On Tue, Nov 10, 2009 at 10:53 AM, Darryl Wallace
> > <darryl.wallace@prosensus.ca> wrote:
> >> I currently do as you suggested. But when the dataset size becomes
> large,
> >> it gets to be quite slow due to the overhead of python looping.
> >
> > Are you using a for loop? Is so, something like this might be faster:
> >
> >>> x = [1, 2, '', 3, 4, 'String']
> >>> from numpy import nan
> >>> [(z, nan)[type(z) is str] for z in x]
> > [1, 2, nan, 3, 4, nan]
> >
> > I use something similar in my code, so I'm interested to see if anyone
> > can speed things up using python or numpy, or both. I run it on each
> > row of the file replacing '' with None. Here's the benchmark code:
> >
> >>> x = [1, 2, '', 4, 5, '', 7, 8, 9, 10]
> >>> timeit [(z, None)[z is ''] for z in x]
> > 100000 loops, best of 3: 2.32 µs per loop
>
> If there are few missing values (my use case), this seems to be faster:
>
> def myfunc(x):
> while '' in x:
> x[x.index('')] = None
> return x
>
> >> timeit myfunc(x)
> 1000000 loops, best of 3: 697 ns per loop
>
> Note that it works inplace.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
--
______________________________________
Darryl Wallace: Project Leader
ProSensus Inc.
McMaster Innovation Park
175 Longwood Road South, Suite 301
Hamilton, Ontario, L8P 0A1
Canada (GMT -05:00)
Tel: 1-905-528-9136
Fax: 1-905-546-1372
Web site: http://www.prosensus.ca/
______________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20091110/20bf7f74/attachment.html
More information about the NumPy-Discussion
mailing list