[Numpy-discussion] Numpy Array of dtype=object with strings and floats question

Darryl Wallace darryl.wallace@prosensus...
Tue Nov 10 12:53:21 CST 2009


Hello,

On Tue, Nov 10, 2009 at 1:32 PM, Gökhan Sever <gokhansever@gmail.com> wrote:

> On Tue, Nov 10, 2009 at 12:09 PM, Darryl Wallace
> <darryl.wallace@prosensus.ca> wrote:
> > Hello again,
> > The best way so far that's come to my attention is to use:
> > numpy.ma.masked_object
> > The problem with this is that it's looking for a specific instance of an
> > object.  So if the user had some elements of their array that were, for
> > example, "randomString" , then it would not be picked up
> > e.g.
> > ---
> > from numpy import *
> > mixedArray=array([1,2, '', 3, 4, 'randomString'], dtype=object)
> > mixedArrayMask = ma.masked_object(mixedArray, 'randomString').mask
> > ---
> > then mixedArrayMask will yield:
> >
> > array([ False, False, False, False, False, True])
> > Can anyone help me so that all strings are found in the array without
> having
> > to explicitly loop through them in Python?
> > Thanks,
> > Darryl
>
> Why not stick to a same Missing-Value-Code or for all the non-valid
> data? I don't know how MA module would handle mixed MVCs in a same
> array without modifying the existing code. Otherwise looping over the
> array an masking the str instances as NaN would be my alternative
> solution.
>

The reason  I don't stick to a standard missing value code is because a user
may import other things in the datasheet that we need, like row or column
labels, or maybe getting data from a specific source which reports missing
data as a specific string.

I currently do as you suggested.  But when the dataset size becomes large,
it gets to be quite slow due to the overhead of python looping.

Thanks


>
>
> >
> > On Fri, Nov 6, 2009 at 3:56 PM, Darryl Wallace <
> darryl.wallace@prosensus.ca>
> > wrote:
> >>
> >> What I'm doing is importing some data from excel and sometimes there are
> >> strings in the worksheet.  Often times a user will use an empty cell or
> a
> >> string to represent data that is missing.
> >> e.g.
> >> from numpy import *
> >> mixedArray=array([1, 2, '', 3, 4, 'String'], dtype=object)
> >> Two questions:
> >> 1) Is there a quick way to find the elements in the array that are the
> >> strings without iterating over each element in the array?
> >> or
> >> 2) Could I quickly turn it into a masked array of type float where all
> >> string elements are set as missing points?
> >> I've been struggling with this for a while and can't come across a
> method
> >> that will all me to do it without iterating over each element.
> >> Any help or pointers in the right direction would be greatly
> appreciated.
> >> Thanks,
> >> Darryl
> >
> >
> >
> > --
> > ______________________________________
> > Darryl Wallace: Project Leader
> > ProSensus Inc.
> > McMaster Innovation Park
> > 175 Longwood Road South, Suite 301
> > Hamilton, Ontario, L8P 0A1
> > Canada        (GMT -05:00)
> >
> > Tel:       1-905-528-9136
> > Fax:       1-905-546-1372
> >
> > Web site:  http://www.prosensus.ca/
> > ______________________________________
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
>
>
> --
> Gökhan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
______________________________________
Darryl Wallace: Project Leader
ProSensus Inc.
McMaster Innovation Park
175 Longwood Road South, Suite 301
Hamilton, Ontario, L8P 0A1
Canada        (GMT -05:00)

Tel:       1-905-528-9136
Fax:       1-905-546-1372

Web site:  http://www.prosensus.ca/
______________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20091110/dfa9700c/attachment.html 


More information about the NumPy-Discussion mailing list