[Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)
Travis Oliphant
travis@continuum...
Mon Apr 16 23:38:44 CDT 2012
On Apr 16, 2012, at 11:01 PM, Charles R Harris wrote:
>
>
> On Mon, Apr 16, 2012 at 8:46 PM, Travis Oliphant <travis@continuum.io> wrote:
>
> On Apr 16, 2012, at 8:03 PM, Matthew Brett wrote:
>
> > Hi,
> >
> > On Mon, Apr 16, 2012 at 3:06 PM, Travis Oliphant <travis@continuum.io> wrote:
> >
> >> I have heard from a few people that they are not excited by the growth of
> >> the NumPy data-structure by the 3 pointers needed to hold the masked-array
> >> storage. This is especially true when there is talk to potentially add
> >> additional attributes to the NumPy array (for labels and other
> >> meta-information). If you are willing to let us know how you feel about
> >> this, please speak up.
> >
> > I guess there are two questions here
> >
> > 1) Will something like the current version of masked arrays have a
> > long term future in numpy, regardless of eventual API? Most likely
> > answer - yes?
>
> I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C).
>
> I think making numpy.ma a subclass of ndarray has caused all sorts of trouble. It doesn't satisfy 'is a', rather it tries to use inheritance from ndarray for implementation of various parts. The upshot is that almost everything has to be overridden, so it didn't buy much.
This is a valid point. One could create a new object that is binary compatible with the NumPy Array but not really a sub-class but provides the array interface. We could keep Mark's modifications to the array interface as well so that it can communicate a mask.
-Travis
>
>
> > 2) Will likely changes to the masked array API make any difference to
> > the number of extra pointers? Likely answer no?
> >
> > Is that right?
>
> The answer to this is very likely no on the Python side. But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not).
>
> >
> > I have the impression that the masked array API discussion still has
> > not come out fully into the unforgiving light of discussion day, but
> > if the answer to 2) is No, then I suppose the API discussion is not
> > relevant to the 3 pointers change.
>
> You are correct that the API discussion is separate from this one. Overall, I was surprised at how fervently people would oppose ABI changes. As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. But, that is the current climate.
>
> In that climate, my concern is that we haven't finalized the API but are rapidly cementing the *structure* of NumPy arrays into a modified form that has real downstream implications. Two other people I have talked to share this concern (nobody who has posted on this list before but who are heavy users of NumPy). I may have missed the threads where it was discussed, but have these structure changes and their implications been fully discussed? Is there anyone else who is concerned about adding 3 more pointers (12 bytes or 24 bytes) to the NumPy structure?
>
> As Chuck points out, 3 more pointers is not necessarily that big of a deal if you are talking about a large array (though for small arrays it could matter). But, I personally know of half-written NEPs that propose to add more pointers to the NumPy array:
>
> * to allow meta-information to be attached to a NumPy array
> * to allow labels to be attached to a NumPy array (ala data-array)
> * to allow multiple chunks for an array.
>
> Are people O.K. with 5 or 6 more pointers on every NumPy array? We could also think about adding just one more pointer to a new "enhanced" structure that contains multiple enhancements to the NumPy array.
>
>
> Yes, this whole thing could get out of hand with too many extras. One of the things you could discuss with Mark is how to deal with this, or limit the modifications. At some point the ndarray class could become cumbersome, complicated, and difficult to maintain. We need to be careful that it doesn't go that way. I'd like to keep it as simple as possible, the question is what is fundamental. The main long term advantage of having masks part of the base is the possibility of adapted loops in ufuncs, which would give the advantage of speed. But that is just how it looks from where I stand, no doubt others have different priorities.
>
> But, this whole line of discussion sounds a lot like a true sub-class of the NumPy array at the C-level. It has the benefit that only people that use the features of the sub-class have to worry about using the extra space.
>
> Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for.
>
>
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120416/c27e61ae/attachment-0001.html
More information about the NumPy-Discussion
mailing list