[Numpy-discussion] A crazy masked-array thought

Charles R Harris charlesr.harris@gmail....
Sat Apr 28 22:18:55 CDT 2012


On Sat, Apr 28, 2012 at 10:58 AM, Neal Becker <ndbecker2@gmail.com> wrote:

> Nathaniel Smith wrote:
>
> > On Sat, Apr 28, 2012 at 7:38 AM, Richard Hattersley
> > <rhattersley@gmail.com> wrote:
> >> So, assuming numpy.ndarray became a strict subclass of some new masked
> >> array, it looks plausible that adding just a few checks to
> numpy.ndarray to
> >> exclude the masked superclass would prevent much downstream code from
> >> accidentally operating on masked arrays.
> >
> > I think the main point I was trying to make is that it's the existence
> > and content of these checks that matters. They don't necessarily have
> > any relation at all to which thing Python calls a "superclass" or a
> > "subclass".
> >
> > -- Nathaniel
>
> I don't agree with the argument that ma should be a superclass of ndarray.
>  It
> is ma that is adding features.  That makes it a subclass.  We're not
> talking
> mathematics here.
>

It isn't a subclass either. In a true subclass, anything that worked on the
base class would work equally well on a subclass *without modification*.
Basically, it's an independent class with special functions that can handle
combinations and ufuncs. Look at all the functions exported in
numpy/ma/core.py. Inheritance really isn't an concept appropriate to this
case. Pretty much all the functions are rewritten for masked arrays. Which
is one reason maintenance is a hassle, lots of things have to be maintained
in two places.

 | There is a well-known disease of OOP where everything seems to bubble up
to the

> top of the class hierarchy - so that the base class becomes bloated to
> support
> every feature needed by subclasses.  I believe that's considered poor
> design.
>
> Is there a way to support ma as a subclass of ndarray, without introducing
> overhead into ndarray?  Without having given this much real thought, I do
> have
> some idea.  What are the operations that we need on arrays?  The most
> basic are:
>
> 1. element access
> 2. get size (shape)
>
> In an OO design, these would be virtual functions (or in C, pointers to
> functions).  But this would introduce unacceptable overhead.
>
>
Sure, and you would still have two different functions of almost everything.


> In a generic programming design (c++ templates), we would essentially
> generate 2
> copies of every function, one that operates on plain arrays, and one that
> operates on masked arrays, each using the appropriate function for element
> access, shape, etc.  This way, no uneeded overhead is introduced,
> (although the
> code size is increased - but this is probably of little consequence on
> modern
> demand-paged OS).
>
> Following this approach, ma and ndarray don't have to have any inheritance
> relation.  OTOH, inheritance is probably useful since there are many common
> features to ma and ndarray, and a lot of code could be shared.
>

Not many common behaviours. Analogous behaviours, perhaps. And since
everything ends up written twice the best was to share code is to do it in
the base class.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120428/36a4389e/attachment.html 


More information about the NumPy-Discussion mailing list