[Numpy-discussion] alterNEP - was: missing data discussion round 2

Nathaniel Smith njs@pobox....
Thu Jun 30 12:51:08 CDT 2011


On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
> In the interest of making the discussion as concrete as possible, here
> is my draft of an alternative proposal for NAs and masking, based on
> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
> right, that the ideas become much clearer when the NA idea and the
> MASK idea are separate.   Please do pitch in for things I may have
> missed or misunderstood:
[...]

Thanks for writing this up! I stuck it up as a gist so we can edit it
more easily:
  https://gist.github.com/1056379/
This is your initial version:
  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
And I made a few changes:
  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
Specifically, I added a rationale section, changed np.MASKED to
np.IGNORE (as per comments in this thread), and added a vowel to
"propmsk".

One thing I wonder about the design is whether having an
np.MASKED/np.IGNORE value at all helps or hurts. (Occam tells us never
to multiply entities without necessity! And it's a bit of an odd fit
to the masking concept, since the whole idea is that masking is a
property of the array, not the individual datums.)

Currently, I see the following uses for it:
  -- As a return value when someone tries to scalar-index a masked value
  -- As a placeholder to specify masked values when creating an array
from a list (but not when assigning to an array later)
  -- As a return value when using propmask=True
  -- As something to display when printing a masked array

Another way of doing things would be:
  -- Scalar-indexing a masked value returns an error, like trying to
index past the end of an array. (Slicing etc. would still return a new
masked array.)
  -- Having some sort of placeholder does seem nice, but I'm not sure
how often you need to type out a masked array. And I notice that
numpy.ma does support this (like so: ma.array([1, ma.masked, 3])) but
the examples in the docs never use it. The replacement idiom would be
something like: my_data = np.array([1, 999, 3], masked=True);
my_data.visible = (my_data != 999). So maybe just leave out the
placeholder value, at least for version 1?
  -- I don't really see the logic for supporting 'propmask' at all.
AFAICT no-one has ever even considered this as a useful feature for
numpy.ma, never mind implemented it?
  -- When printing, the numpy.ma approach of using "--" seems much
more readable than me than having "IGNORE" all over my screen.

So overall, making these changes would let us simplify the design. But
maybe propmask is really critical for some use case, or there's some
good reason to want to scalar-index missing values without getting an
error?

-- Nathaniel


More information about the NumPy-Discussion mailing list