[Numpy-discussion] Masked Arrays in NumPy 1.x
Tue Apr 10 02:16:01 CDT 2012
On 04/09/2012 06:52 PM, Travis Oliphant wrote:
> Hey all,
> I've been waiting for Mark Wiebe to arrive in Austin where he will
> spend several weeks, but I also know that masked arrays will be only
> one of the things he and I are hoping to make head-way on while he is
> in Austin. Nevertheless, we need to make progress on the masked
> array discussion and if we want to finalize the masked array
> implementation we will need to finish the design.
> I've caught up on most of the discussion including Mark's NEP,
> Nathaniel's NEP and other writings and the very-nice mailing list
> discussion that included a somewhat detailed discussion on the
> algebra of IGNORED. I think there are some things still to be
> decided. However, I think some things are pretty clear:
> 1) Masked arrays are going to be fundamental in NumPy and these
> should replace most people's use of numpy.ma. The numpy.ma code
> will remain as a compatibility layer
Excellent! In mpl and other heavy users of numpy.ma there will still be
work to do to handle all varieties of input, but it should be manageable.
> 2) The reality of #1 and NumPy's general philosophy to date means
> that masked arrays in NumPy should support the common use-cases of
> masked arrays (including getting and setting of the mask from the
> Python and C-layers). However, the semantic of what the mask implies
> may change from what numpy.ma uses to having a True value meaning
I never understood a strong argument for that change from numpy.ma.
When editing data, it is natural to use flag bits to indicate various
rejection criteria; no bit set means it's all good, so a False is
naturally "good" and True is naturally "mask it out". But I can live
with the change if you and Mark see a good reason for it.
> 3) There will be missing-data dtypes in NumPy. Likely
> only a limited sub-set (string, bytes, int64, int32, float32,
> float64, complex64, complex32, and object) with an API that allows
> more to be defined if desired. These will most likely use Mark's
> nice machinery for managing the calculation structure without
> requiring new C-level loops to be defined.
So, these will be the bit-pattern versions of NA, correct? With the bit
pattern specified as an attribute of the dtype? Good, but...
Are we getting into trouble here, figuring out how to handle all
combinations of numpy.ma, masked dtypes, and Mark's masked NA?
> 4) I'm still not sure about whether the IGNORED concept is necessary
> or not. I really like the separation that was emphasized between
> implementation (masks versus bit-patterns) and operations
> (propagating versus non-propagating). Pauli even created another
> dimension which I don't totally grok and therefore can't remember.
> Pauli? Do you still feel that is a necessary construction? But, do
> we need the IGNORED concept to indicate what amounts to different
> default key-word arguments to functions that operate on NumPy arrays
> containing missing data (however that is represented)? My current
> weak view is that it is not really necessary. But, I could be
> convinced otherwise.
I agree (if I understand you correctly); the goal is an expressive,
explicit language that lets people accomplish what they want, clearly
and quickly, and I think this is more a matter of practicality than
purity of theory. Nevertheless, achieving that is easier said than
done, and figuring out how to handle corner cases is better done sooner
Numpy.ma has never been perfect, but it has proven a good tool for
practical work in my experience. (Many thanks to Pierre GM for all his
work on it.) One of the nice things it does is to automatically mask out
invalid results. This saves quit a bit of explicit checking that would
otherwise be required.
> I think the good news is that given Mark's hard-work and Nathaniel's
> follow-up we are really quite far along. I would love to get
> Nathaniel's opinion about what remains un-done in the current NumPy
> code-base. I would also appreciate knowing (from anyone with an
> interest) opinions of items 1-4 above and anything else I've left
More information about the NumPy-Discussion