[Numpy-discussion] A crazy masked-array thought
Sat Apr 28 01:38:27 CDT 2012
On 27 April 2012 17:42, Travis Oliphant <email@example.com> wrote:
> 1) There is a lot of code out there that does not know anything about
> masks and is not used to checking for masks. It enlarges the basic
> abstraction in a way that is not backwards compatible *conceptually*.
> This smells fishy to me and I could see a lot of downstream problems from
> libraries that rely on NumPy.
That's exactly why I'd love to see plain arrays remain functionally
It's just a small, random sample, but here's how a few routines from NumPy
and SciPy sanitise their inputs...
numpy.trapz (aka scipy.integrate.trapz) - numpy.asanyarray
scipy.spatial.KDTree - numpy.asarray
scipy.spatial.cKDTree - numpy.ascontiguousarray
scipy.integrate.odeint - PyArray_ContiguousFromObject
scipy.interpolate.interp1d - numpy.array
scipy.interpolate.griddata - numpy.asanyarray & numpy.ascontiguousarray
So, assuming numpy.ndarray became a strict subclass of some new masked
array, it looks plausible that adding just a few checks to numpy.ndarray to
exclude the masked superclass would prevent much downstream code from
accidentally operating on masked arrays.
> 2) We cannot agree on how masks should be handled and consequently don't
> have a real plan for migrating numpy.ma to use these masks. So, we are
> just growing the API and introducing uncertainty for unclear benefit ---
> especially for the person that does not want to use masks.
I've not yet looked at how numpy.ma users could be migrated. But if we make
masked arrays a strict superclass and leave the numpy/ndarray interface and
behaviour unchanged, API growth shouldn't be an issue. End-users will be
able to completely ignore the existence of masked arrays (except for the
minority(?) for whom the ABI/re-compile issue would be relevant).
> 3) Subclassing in C in Python requires that C-structures are *binary*
> compatible. This implies that all subclasses have *more* attributes than
> the superclass. The way it is currently implemented, that means that POAs
> would have these extra pointers they don't need sitting there to satisfy
> that requirement. From a C-struct perspective it therefore makes more
> sense for MAs to inherit from POAs. Ideally, that shouldn't drive the
> design, but it's part of the landscape in NumPy 1.X
I'd hate to see the logical class hierarchy inverted (or collapsed to a
single class) just to save a pointer or two from the struct. Now seems like
a golden opportunity to fix the relationship between masked and plain
arrays. I'm assuming (and implicitly checking that assumption with this
statement!) that there's far more code using the Python interface to NumPy,
than there is code using the C interface. So I'm urging that the logical
consistency of the Python interface (and even the C and Cython interfaces)
takes precedence over the C-struct memory saving.
I'm not sure I agree with "extra pointers they don't need". If we make
plain arrays a subclass of masked arrays, aren't these pointers essential
to ensure masked array methods can continue to work on plain arrays without
requiring special code paths?
> I have some ideas about how to move forward, but I'm anxiously awaiting
> the write-up that Mark and Nathaniel are working on to inform and enhance
> those ideas.
As an aside, the implication of preserving the behaviour of the
numpy/ndarray interface is that masked arrays will need a *new* interface.
>>> import mumpy # Yes - I know it's a terrible name! But I had to write
*something* ... sorry! ;-)
>>> import numpy
>>> a = mumpy.array(...) # makes a masked array
>>> b = numpy.array(...) # makes a plain array
>>> isinstance(a, mumpy.ndarray)
>>> isinstance(b, mumpy.ndarray)
>>> isinstance(a, numpy.ndarray)
>>> isinstance(b, numpy.ndarray)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion