[Numpy-discussion] Warnings in numpy.ma.test()
josef.pktd@gmai...
josef.pktd@gmai...
Thu Mar 18 15:15:42 CDT 2010
On Thu, Mar 18, 2010 at 3:46 PM, Christopher Barker
<Chris.Barker@noaa.gov> wrote:
> Gael Varoquaux wrote:
>> On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
>>> sure -- that's kind of my point -- if EVERY numpy array were
>>> (potentially) masked, then folks would write code to deal with them
>>> appropriately.
>>
>> That's pretty much saying: "I have a complicated problem and I want every
>> one else to have to deal with the full complexity of it, even if they
>> have a simple problem".
>
> Well -- I did say it was a fantasy...
>
> But I disagree -- having invalid data is a very common case. What we
> have now is a situation where we have two parallel systems, masked
> arrays and regular arrays. Each time someone does something new with
> masked arrays, they often find another missing feature, and have to
> solve that. Also, the fact that masked arrays are tacked on means that
> performance suffers.
>
> Maybe it would simply be too ugly, but If I were to start from the
> ground up with a scientific computing package, I would want to put in
> support for missing values from that start.
>
> There are some cases where is it simply too complicated or to expensive
> to handle missing values -- fine, then an exception is raised.
>
> You may be right about how complicated it would be, and what would
> happen is that everyone would simply put a:
>
> if a.masked:
> raise ("I can't deal with masked dat")
>
> stanza at the top of every new method they wrote, but I suspect that if
> the core infrastructure was in place, it would be used.
>
> I'm facing this at the moment: not a big deal, but I'm using histogram2d
> on some data. I just realized that it may have some NaNs in it, and I
> have no idea how those are being handled. I also want to move to masked
> arrays and have no idea if histogram2d can deal with those. At the
> least, I need to do some testing, and I suspect I'll need to do some
> hacking on histogram2d (or just write my own).
>
> I'm sure I'm not the only one in the world that needs to histogram some
> data that may have invalid values -- so wouldn't it be nice if that were
> already handled in a defined way?
histogram2d handles neither masked arrays nor arrays with nans
correctly, but assuming you want to drop all columns that have at
least one missing value, then it is just one small step. Unless you
want to replace the missing value with the mean, or a conditional
prediction, or by interpolation.
This could be included in the histogram function.
>>> x = np.ma.array([[1,2, 3],[2,1,1]], mask=[[0, 1,0], [0,0,0]])
>>> np.histogram2d(x[0],x[1],bins=3)
(array([[ 0., 0., 1.],
[ 1., 0., 0.],
[ 1., 0., 0.]]), array([ 1. , 1.66666667,
2.33333333, 3. ]), array([ 1. , 1.33333333,
1.66666667, 2. ]))
>>> x2=x[:,~x.mask.any(0)]
>>> np.histogram2d(x2[0],x2[1],bins=3)
(array([[ 0., 0., 1.],
[ 0., 0., 0.],
[ 1., 0., 0.]]), array([ 1. , 1.66666667,
2.33333333, 3. ]), array([ 1. , 1.33333333,
1.66666667, 2. ]))
>>> x = np.array([[1.,np.nan, 3],[2,1,1]])
>>> x
array([[ 1., NaN, 3.],
[ 2., 1., 1.]])
>>> np.histogram2d(x[0],x[1],bins=3)
(array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]), array([ NaN, NaN, NaN, NaN]), array([ 1.
, 1.33333333, 1.66666667, 2. ]))
>>> x2=x[:,np.isfinite(x).all(0)]
>>> np.histogram2d(x2[0],x2[1],bins=3)
(array([[ 0., 0., 1.],
[ 0., 0., 0.],
[ 1., 0., 0.]]), array([ 1. , 1.66666667,
2.33333333, 3. ]), array([ 1. , 1.33333333,
1.66666667, 2. ]))
>>>
Josef
> -Chris
>
>
>
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE (206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
>
> Chris.Barker@noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list