[Numpy-discussion] Why does np.nan{min, max} clobber my array mask?
David Carmean
dlc@halibut....
Sat Feb 13 21:04:10 CST 2010
I'm just starting to work with masked arrays and I've found some behavior that
definitely does not follow the Principle of Least Surprise:
I've generated a 2-d array from a list of lists, where the elements are floats with
a good number of NaNs. Inspections shows the expected numbers for ma.count() and
ma.count_masked().
However, as soon as I run np.nanmin() or np.nanmax() over it, all of the mask elements
are reset to False.
(Pdb) flat = flatten(uut) # my own utility function
(Pdb) len ( [ x for x in flat if x+0 == x ] ) # only way I could figure to detect
4086
(Pdb) len ( [ x for x in flat if x+0 != x ] ) # 1458 NaNs in the set.
1458
(Pdb) msk = ma.masked_invalid(uut)
(Pdb) msk.shape
(99, 56)
(Pdb) ma.count(msk)
4086
(Pdb) ma.count_masked(msk)
1458
(Pdb) msk.hardmask
False
(Pdb) msk.harden_mask() # harden the mask first, for demo
masked_array(data =....
(Pdb) msk.hardmask
True
(Pdb) rslt_hm = np.nanmin(msk, axis=1)
(Pdb) rslt_hm.shape
(99,)
(Pdb) ma.count_masked(rslt_hm)
0
(Pdb) ma.count(rslt_hm)
99
# Is my original still OK?
msk
masked_array(data = ...
... [False False False ..., True True True]],
fill_value = 1e+20)
(Pdb) msk.soften_mask() # now re-soften the mask:
masked_array(data = ....
(Pdb) rslt_softmask = np.nanmin(msk, axis=1)
(Pdb) rslt_softmask.shape
(99,)
(Pdb) msk.mask.any()
False
# BAM! note: 'control' is a hardmasked control copy:
(Pdb) control.mask.any()
True
As the above shows, I discovered that I can work around this by setting the hardmask
property, but ... there is no mention of such a side-effect in the docs (including
the brand-new reference book).
Have I found a bug? This is 1.4.0 running under 64-bit Windows 7 ( Python(x,y) distribution).
More information about the NumPy-Discussion
mailing list