[Numpy-discussion] alterNEP - was: missing data discussion round 2

Matthew Brett matthew.brett@gmail....
Thu Jun 30 12:58:59 CDT 2011


Hi,

On Thu, Jun 30, 2011 at 6:51 PM, Nathaniel Smith <njs@pobox.com> wrote:
> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>> In the interest of making the discussion as concrete as possible, here
>> is my draft of an alternative proposal for NAs and masking, based on
>> Nathaniel's comments.  Writing it, it seemed to me that Nathaniel is
>> right, that the ideas become much clearer when the NA idea and the
>> MASK idea are separate.   Please do pitch in for things I may have
>> missed or misunderstood:
> [...]
>
> Thanks for writing this up! I stuck it up as a gist so we can edit it
> more easily:
>  https://gist.github.com/1056379/
> This is your initial version:
>  https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191
> And I made a few changes:
>  https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583
> Specifically, I added a rationale section, changed np.MASKED to
> np.IGNORE (as per comments in this thread), and added a vowel to
> "propmsk".

Thanks for doing that.

> One thing I wonder about the design is whether having an
> np.MASKED/np.IGNORE value at all helps or hurts. (Occam tells us never
> to multiply entities without necessity! And it's a bit of an odd fit
> to the masking concept, since the whole idea is that masking is a
> property of the array, not the individual datums.)
>
> Currently, I see the following uses for it:
>  -- As a return value when someone tries to scalar-index a masked value
>  -- As a placeholder to specify masked values when creating an array
> from a list (but not when assigning to an array later)
>  -- As a return value when using propmask=True
>  -- As something to display when printing a masked array
>
> Another way of doing things would be:
>  -- Scalar-indexing a masked value returns an error, like trying to
> index past the end of an array. (Slicing etc. would still return a new
> masked array.)
>  -- Having some sort of placeholder does seem nice, but I'm not sure
> how often you need to type out a masked array. And I notice that
> numpy.ma does support this (like so: ma.array([1, ma.masked, 3])) but
> the examples in the docs never use it. The replacement idiom would be
> something like: my_data = np.array([1, 999, 3], masked=True);
> my_data.visible = (my_data != 999). So maybe just leave out the
> placeholder value, at least for version 1?
>  -- I don't really see the logic for supporting 'propmask' at all.
> AFAICT no-one has ever even considered this as a useful feature for
> numpy.ma, never mind implemented it?
>  -- When printing, the numpy.ma approach of using "--" seems much
> more readable than me than having "IGNORE" all over my screen.
>
> So overall, making these changes would let us simplify the design. But
> maybe propmask is really critical for some use case, or there's some
> good reason to want to scalar-index missing values without getting an
> error?

I'm afraid, like you, I'm a little lost in the world of masking,
because I only need the NAs.  I was trying to see if I could come up
with an API that picked up some of the syntactic convenience of NAs,
without conflating NAs with IGNOREs.   I guess we need some feedback
from the 'NA & IGNORE Share the API' (NISA?) proponents to get an idea
of what we've missed.  @Mark, @Chuck, guys - what have we lost here by
separating the APIs?

See you,

Matthew


More information about the NumPy-Discussion mailing list