[NumPy-Tickets] [NumPy] #1447: Inconsistent behavior when indexing a masked array with a bool array

NumPy Trac numpy-tickets@scipy....
Mon Apr 5 12:33:11 CDT 2010


#1447: Inconsistent behavior when indexing a masked array with a bool array
--------------------------------+-------------------------------------------
  Reporter:  nathanielpeterson  |       Owner:  somebody                 
      Type:  defect             |      Status:  closed                   
  Priority:  normal             |   Milestone:                           
 Component:  Other              |     Version:  1.3.0                    
Resolution:  invalid            |    Keywords:  bool-indexed masked array
--------------------------------+-------------------------------------------
Changes (by pierregm):

  * status:  new => closed
  * resolution:  => invalid


Comment:

 On Apr 2, 2010, at 1:08 AM, Nathaniel Peterson wrote:

 ''Is this behavior of masked arrays intended, or is it a bug''?

 It's not a bug, it's an unfortunate side effect of using boolean masked
 arrays for indices. Don't. Instead, you should fill the masked arrays with
 either True or False (depending on what you want).

 Now, for some explanations:
 {{{
 import numpy as np
 a=np.ma.fix_invalid(np.array([np.nan,-1,0,1]))
 b=np.ma.fix_invalid(np.array([np.nan,-1,0,1]))
 }}}
 When using {{{ma.fix_invalid}}}, the nans and infs are masked and the
 corresponding set to a default (1e+20 for floats). Thus, you have:
 {{{
 print a.data
 [  1.00000000e+20  -1.00000000e+00   0.00000000e+00   1.00000000e+00]

 idx=(a==b)
 }}}

 Now, you compare two masked arrays. In practice, the arrays are first
 filled with 0, compared, and the mask is created afterwards. In the
 current case, we get a new masked array, whose first entry is masked
 (because a[0] is masked), and because the two underlying ndarrays are
 identical, the underlying ndarray of the result is {{{[True True  True
 True]}}}.

 {{{
 print(a[idx][3])
 # 1.0
 }}}


 The fun starts now: you are using idx, a masked array, as indices. Because
 the fancy indexing mechanism of numpy doesn't know how to process masked
 arrays, their underlying ndarray are used instead. Consider a[idx]
 equivalent to a[np.array(idx)]. Because {{{np.array(idx) == idx.data ==
 [True  True  True  True]}}}, a[idx] returns a, hence the (4,) shape.

 ''But if I change the first element of b from np.nan to 2.0 then
 a[idx2] has shape (3,) despite np.alltrue(idx==idx2) being True:''

 {{{
 c=np.ma.fix_invalid(np.array([2.0,-1,0,1]))
 idx2=(a==c)
 }}}
 So, c is a masked array without any masked values. When comparing a and c,
 the arrays are once again filled with 0 before the comparison. The ndarray
 underlying idx2 is therefore [False True True True], and the first item is
 masked (still because a[0] is masked). If you use idx2 for indexing, it's
 transformed to a ndarray, and you end up with the last three items of a
 (hence the (3.) shape).

 {{{
 assert(np.alltrue(idx==idx2))
 }}}
 You compare the two masked arrays idx and idx2. Remember the filling with
 0 that happens below the hood, so you end up comparing [False True True
 True] and [False True True True] with np.alltrue, which of course returns
 True...

 Morale of the story: don't use masked arrays in fancy indexing, as you may
 not get what you expect.
 I hope it clarified the situation a bit, but don't hesitate to ask more
 questions.
 Cheers
 P.

-- 
Ticket URL: <http://projects.scipy.org/numpy/ticket/1447#comment:1>
NumPy <http://projects.scipy.org/numpy>
My example project


More information about the NumPy-Tickets mailing list