[Numpy-discussion] NA masks in the next numpy release?

Travis Oliphant oliphant@enthought....
Thu Oct 27 20:16:24 CDT 2011


That is a pretty good explanation.   I find myself convinced by Matthew's arguments.    I think that being able to separate ABSENT from IGNORED is a good idea.   I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already). 

What is the counter-argument to this proposal?  

-Travis




On Oct 27, 2011, at 7:31 PM, Matthew Brett wrote:

> Hi,
> 
> On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant <oliphant@enthought.com> wrote:
>> So, I am very interested in making sure I remember the details of the counterproposal.    What I recall is that you wanted to be able to differentiate between a "bit-pattern" mask and a boolean-array mask in the API.   I believe currently even when bit-pattern masks are implemented the difference will be "hidden" from the user on the Python level.
>> 
>> I am sure to be missing other parts of the discussion as I have been in and out of it.
> 
> The ideas
> --------------
> 
> The question that we were addressing in the alter-NEP was: should
> missing values implemented as bitpatterns appear to be the same as
> missing values implemented with masks?  We said no, and Mark said yes.
> 
> To restate the argument in brief; Nathaniel and I and some others
> thought that there were two separable ideas in play:
> 
> 1) A value that is finally and completely missing. == ABSENT
> 2) A value that we would like to ignore for the moment but might want
> back at some future time == IGNORED
> 
> (I'm using the adjectives ABSENT and IGNORED here to be short for the
> objects 'absent value'  and 'ignored value'.  This is to distinguish
> from the verbs below).
> 
> We thought bitpatterns were a good match for the former, and masking
> was a good match for the latter.
> 
> We all agreed there were two things you might like to do with values
> that were missing in both senses above:
> 
> A) PROPAGATE; V + 1 == V
> B) SKIP; K + 1 == 1
> 
> (Note verbs for the behaviors).
> 
> I believe the original np.ma masked arrays always SKIP.
> 
> In [2]: a = np.ma.masked_array?
> In [3]: a = np.ma.masked_array([99, 2], mask=[True, False])
> In [4]: a
> Out[4]:
> masked_array(data = [-- 2],
>             mask = [ True False],
>       fill_value = 999999)
> In [5]: a.sum()
> Out[5]: 2
> 
> There was some discussion as to whether there was a reason to think
> that ABSENT should always or by default PROPAGATE, and IGNORED should
> always or by default SKIP.  Chuck is referring to this idea when he
> said further up this thread:
> 
>> For instance, I'm thinking skipna=1 is the natural default for the masked arrays.
> 
> The current implementation
> ---------------------------------------
> 
> What we have now is an implementation of masked arrays, but more
> tightly integrated into the numpy core.  In our language we have an
> implementation of IGNORED that is tuned to be nearly indistinguishable
> from the behavior we are expecting of ABSENT.
> 
> Specifically, once you have done this:
> 
> In [9]: a = np.array([99, 2], maskna=True)
> 
> you can get something representing the mask:
> 
> In [11]: np.isna(a)
> Out[11]: array([False, False], dtype=bool)
> 
> but I believe there is no way of setting the mask directly.  In order
> to set the mask, you have to do what looks like an assignment:
> 
> In [12]: a[0] = np.NA
> In [14]: a
> Out[14]: array([NA, 2])
> 
> In fact, what has happened is the mask has changed, but the underlying
> value has not:
> 
> In [18]: orig = np.array([99, 2])
> 
> In [19]: a = orig.view(maskna=True)
> 
> In [20]: a[0] = np.NA
> 
> In [21]: a
> Out[21]: array([NA, 2])
> 
> In [22]: orig
> Out[22]: array([99,  2])
> 
> This is different from real assignment:
> 
> In [23]: a[0] = 0
> 
> In [24]: a
> Out[24]: array([0, 2], maskna=True)
> 
> In [25]: orig
> Out[25]: array([0, 2])
> 
> Some effort has gone into making it difficult to pull off the mask:
> 
> In [30]: a.view(np.int64)
> Out[30]: array([NA, 2])
> 
> In [31]: a.view(np.int64).flags
> Out[31]:
>  C_CONTIGUOUS : True
>  F_CONTIGUOUS : True
>  OWNDATA : False
>  MASKNA : True
>  OWNMASKNA : False
>  WRITEABLE : True
>  ALIGNED : True
>  UPDATEIFCOPY : False
> 
> In [32]: a.astype(np.int64)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> /home/mb312/<ipython-input-32-e7f3381c9692> in <module>()
> ----> 1 a.astype(np.int64)
> 
> ValueError: Cannot assign NA to an array which does not support NAs
> 
> The default behavior of the masked values is PROPAGATE, but they can
> be individually made to SKIP:
> 
> In [28]: a.sum() # PROPAGATE
> Out[28]: NA(dtype='int64')
> 
> In [29]: a.sum(skipna=True) # SKIP
> Out[29]: 2
> 
> Where's the beef?
> -------------------------
> 
> I personally still think that it is confusing to fuse the concept of:
> 
> 1) Masked arrays
> 2) Arrays with bitpattern codes for missing
> 
> and the concepts of
> 
> A) ABSENT and
> B) IGNORED
> 
> Consequences for current code
> --------------------------------------------
> 
> Specifically, it still seems to me to make sense to prefer this:
> 
>>> a = np.array([99, 2[, masking=True)
>>> a.mask
> [ True, True ]
>>> a.sum()
> 101
>>> a.mask[0] = False
>>> a.sum()
> 2
> 
> It might make sense, as Chuck suggests, to change the default to
> 'skipna=True', and I'd further suggest renaming np.NA to np.IGNORED
> and 'skipna' to skipignored' for clarity.
> 
> I still think the pseudo-assignment:
> 
> In [20]: a[0] = np.NA
> 
> is confusing, and should be removed.
> 
> Later, should we ever have bitpatterns, there would be something like
> np.ABSENT.  This of course would make sense for assignment:
> 
> In [20]: a[0] = np.ABSENT
> 
> There would be another keyword argument 'skipabsent=False' such that,
> when this is False, the ABSENT values propagate.
> 
> Honestly, I think that NA should be a synonym for ABSENT, and so
> should be removed until the dust has settled, and restored as (np.NA
> == np.ABSENT)
> 
> And I think, these two ideas, of masking / IGNORED and bitpattern /
> ABSENT, would be much easier to explain.
> 
> That's my best shot.
> 
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

---
Travis Oliphant
Enthought, Inc.
oliphant@enthought.com
1-512-536-1057
http://www.enthought.com





More information about the NumPy-Discussion mailing list