[Numpy-discussion] Possible bug in indexed masked arrays

Pierre GM pgmdevlist@gmail....
Mon Apr 5 02:08:05 CDT 2010


On Apr 2, 2010, at 1:08 AM, Nathaniel Peterson wrote:
> 
> Is this behavior of masked arrays intended, or is it a bug? 

It's not a bug, it's an unfortunate side effect of using boolean masked arrays for indices. Don't. Instead, you should fill the masked arrays with either True or False (depending on what you want).

Now, for some explanations:

> import numpy as np
> a=np.ma.fix_invalid(np.array([np.nan,-1,0,1]))
> b=np.ma.fix_invalid(np.array([np.nan,-1,0,1]))

When using ma.fix_invalid, the nans and infs are masked and the corresponding set to a default (1e+20 for floats). Thus, you have:
>>> print a.data
[  1.00000000e+20  -1.00000000e+00   0.00000000e+00   1.00000000e+00]

> idx=(a==b)

Now, you compare two masked arrays. In practice, the arrays are first filled with 0, compared, and the mask is created afterwards. In the current case, we get a new masked array, whose first entry is masked (because a[0] is masked), and because the two underlying ndarrays are identical, the underlying ndarray of the result is [True  True  True  True].

> print(a[idx][3])
> # 1.0


The fun starts now: you are using idx, a masked array, as indices. Because the fancy indexing mechanism of numpy doesn't know how to process masked arrays, their underlying ndarray are used instead. Consider a[idx] equivalent to a[np.array(idx)]. Because np.array(idx) == idx.data == [True  True  True  True], a[idx] returns a, hence the (4,) shape.

> But if I change the first element of b from np.nan to 2.0 then
> a[idx2] has shape (3,) despite np.alltrue(idx==idx2) being True:
> 
> c=np.ma.fix_invalid(np.array([2.0,-1,0,1]))
> idx2=(a==c)

So, c is a masked array without any masked values. When comparing a and c, the arrays are once again filled with 0 before the comparison. The ndarray  underlying idx2 is therefore [False True True True], and the first item is masked (still because a[0] is masked). If you use idx2 for indexing, it's transformed to a ndarray, and you end up with the last three items of a (hence the (3.) shape).

> assert(np.alltrue(idx==idx2))

Now, you compare the two masked arrays idx and idx2. Remember the filling with 0 that happens below the hood, so you end up comparing [False True True True] and [False True True True] with np.alltrue, which of course returns True...

Morale of the story: don't use masked arrays in fancy indexing, as you may not get what you expect.
I hope it clarified the situation a bit, but don't hesitate to ask more questions.
Cheers
P.



More information about the NumPy-Discussion mailing list