[Numpy-discussion] masked arrays as array indices (is a bad idea)

Pierre GM pgmdevlist@gmail....
Mon Sep 21 13:43:26 CDT 2009



On Sep 21, 2009, at 12:17 PM, Ryan May wrote:

> 2009/9/21 Ernest Adrogué <eadrogue@gmx.net>
> Hello there,
>
> Given a masked array such as this one:
>
> In [19]: x = np.ma.masked_equal([-1, -1, 0, -1, 2], -1)
>
> In [20]: x
> Out[20]:
> masked_array(data = [-- -- 0 -- 2],
>             mask = [ True  True False  True False],
>       fill_value = 999999)
>
> When you make an assignemnt in the vein of x[x == 0] = 25
> the result can be a bit puzzling:
>
> In [21]: x[x == 0] = 25
>
> In [22]: x
> Out[22]:
> masked_array(data = [25 25 25 25 2],
>             mask = [False False False False False],
>       fill_value = 999999)
>
> Is this the correct result or have I found a bug?
>
> I see the same here on 1.4.0.dev7400.  Seems pretty odd to me.  Then  
> again, it's a bit more complex using masked boolean arrays for  
> indexing since you have True, False, and masked values.  Anyone have  
> thoughts on what *should* happen here?  Or is this it?

Using a masked array in fancy indexing is always a bad idea, as  
there's no way of guessing the behavior one would want for missing  
values: should they be evaluated as False ? True ? You should really  
use the `filled` method to control the behavior.

 >>> x[(x==0).filled(False)]
masked_array(data = [0],
              mask = [False],
        fill_value = 999999)
 >>>x[(x==0).filled(True)]
masked_array(data = [-- -- 0 --],
              mask = [ True  True False  True],
        fill_value = 999999)

P.

[If you're really interested:
When testing for equality, a masked array is first filled with 0 (that  
was the behavior of the first implementation of numpy.ma), tested for  
equality, and the mask of the result set to the mask of the input.   
When used in fancy indexing, a masked array is viewed as a standard  
ndarray by dropping the mask. In the current case, the combination is  
therefore equivalent to (x.filled(0)==0), which explains why the  
missing values are treated as True... I agree that the prefilling may  
not be necessary...]


More information about the NumPy-Discussion mailing list