[Numpy-discussion] A crazy masked-array thought
Charles R Harris
Sat Apr 28 22:18:55 CDT 2012
On Sat, Apr 28, 2012 at 10:58 AM, Neal Becker <firstname.lastname@example.org> wrote:
> Nathaniel Smith wrote:
> > On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley
> > <email@example.com> wrote:
> >> So, assuming numpy.ndarray became a strict subclass of some new masked
> >> array, it looks plausible that adding just a few checks to
> numpy.ndarray to
> >> exclude the masked superclass would prevent much downstream code from
> >> accidentally operating on masked arrays.
> > I think the main point I was trying to make is that it's the existence
> > and content of these checks that matters. They don't necessarily have
> > any relation at all to which thing Python calls a "superclass" or a
> > "subclass".
> > -- Nathaniel
> I don't agree with the argument that ma should be a superclass of ndarray.
> is ma that is adding features. That makes it a subclass. We're not
> mathematics here.
It isn't a subclass either. In a true subclass, anything that worked on the
base class would work equally well on a subclass *without modification*.
Basically, it's an independent class with special functions that can handle
combinations and ufuncs. Look at all the functions exported in
numpy/ma/core.py. Inheritance really isn't an concept appropriate to this
case. Pretty much all the functions are rewritten for masked arrays. Which
is one reason maintenance is a hassle, lots of things have to be maintained
in two places.
| There is a well-known disease of OOP where everything seems to bubble up
> top of the class hierarchy - so that the base class becomes bloated to
> every feature needed by subclasses. I believe that's considered poor
> Is there a way to support ma as a subclass of ndarray, without introducing
> overhead into ndarray? Without having given this much real thought, I do
> some idea. What are the operations that we need on arrays? The most
> basic are:
> 1. element access
> 2. get size (shape)
> In an OO design, these would be virtual functions (or in C, pointers to
> functions). But this would introduce unacceptable overhead.
Sure, and you would still have two different functions of almost everything.
> In a generic programming design (c++ templates), we would essentially
> generate 2
> copies of every function, one that operates on plain arrays, and one that
> operates on masked arrays, each using the appropriate function for element
> access, shape, etc. This way, no uneeded overhead is introduced,
> (although the
> code size is increased - but this is probably of little consequence on
> demand-paged OS).
> Following this approach, ma and ndarray don't have to have any inheritance
> relation. OTOH, inheritance is probably useful since there are many common
> features to ma and ndarray, and a lot of code could be shared.
Not many common behaviours. Analogous behaviours, perhaps. And since
everything ends up written twice the best was to share code is to do it in
the base class.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion