[NumPy-Tickets] [NumPy] #1974: Subscript assignment on `hard_mask` masked array produces wrong result whenever the subscripted mask returns all False (i.e. `nomask`)

NumPy Trac numpy-tickets@scipy....
Tue Nov 8 12:17:18 CST 2011


#1974: Subscript assignment on `hard_mask` masked array produces wrong result
whenever the subscripted mask returns all False (i.e. `nomask`)
----------------------------------+-----------------------------------------
 Reporter:  codewarrior           |       Owner:  pierregm   
     Type:  defect                |      Status:  new        
 Priority:  normal                |   Milestone:  Unscheduled
Component:  numpy.ma              |     Version:  1.6.0      
 Keywords:  MaskedArray, setitem  |  
----------------------------------+-----------------------------------------
 Using subscript assignment on a `hard_mask` masked array with a slice or
 other index that selects a fully unmasked section of the masked array will
 result in the second item of the section being assigned to instead of the
 entire section.

 Upon investigating, it is because mask_or reduces the fully unmasked
 section of the mask to a single 'nomask' value, which is negated and used
 to subscript-assign the section of the data array. This is incorrect
 because negating 'nomask' produces a single 'True' value, which turns into
 a '1' and selects the second item of the section.

 This is apparently due to a brain fart on line 3038 of numpy.ma.core.py,
 in the final else: clause of MaskedArray.__setitem__()

 The original code reads:

 {{{

 mindx = mask_or(_mask[indx], mval, copy=True)
 dindx = self._data[indx]
 if dindx.size > 1:
     dindx[~mindx] = dval
 elif mindx is nomask:
     dindx = dval

 }}}

 It seems like it should be checking 'mindx.size' and not 'dindx.size'
 because mask_or is free to shrink the return value down to 'nomask' which
 would have size 1. When I use this corrected code, I no longer observe the
 problem:

 {{{

 mindx = mask_or(_mask[indx], mval, copy=True)
 dindx = self._data[indx]
 if mindx.size > 1:
     dindx[~mindx] = dval
 elif mindx is nomask:
     dindx = dval
 }}}

 This was apparently fixed in
 https://github.com/numpy/numpy/commit/a6e869b70b09df9381d341ed0d2b18f88d8fe3d6
 but that fix can't be backported to 1.6 because it uses np.copyto().


 Here is code that demonstrates the error.

 {{{

 >>> from numpy import *
 >>> a = arange(30)
 >>> a.shape=5,6
 >>> b = zeros_like(a)
 >>> m = ma.masked_array(a,b,hard_mask=True) #only happens when hard_mask
 is True
 >>> m

 masked_array(data =
  [[0 1 2 3 4 5]
  [6 7 8 9 10 11]
  [12 13 14 15 16 17]
  [18 19 20 21 22 23]
  [24 25 26 27 28 29]],
              mask =
  [[False False False False False False]
  [False False False False False False]
  [False False False False False False]
  [False False False False False False]
  [False False False False False False]],
        fill_value = 999999)

 >>> m[:] = 333
 >>> m                      #entire array should be 333 now
 masked_array(data =
  [[0 1 2 3 4 5]
  [333 333 333 333 333 333] #uh-oh, only the second element is set
  [12 13 14 15 16 17]
  [18 19 20 21 22 23]
  [24 25 26 27 28 29]],
              mask =
  [[False False False False False False]
  [False False False False False False]
  [False False False False False False]
  [False False False False False False]
  [False False False False False False]],
        fill_value = 999999)
 }}}

-- 
Ticket URL: <http://projects.scipy.org/numpy/ticket/1974>
NumPy <http://projects.scipy.org/numpy>
My example project


More information about the NumPy-Tickets mailing list