[Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)
Tue Apr 17 00:44:26 CDT 2012
On Apr 16, 2012, at 11:59 PM, Matthew Brett wrote:
> On Mon, Apr 16, 2012 at 8:40 PM, Travis Oliphant <email@example.com> wrote:
>>>> I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C).
>>> I'd love to hear that argument fleshed out in more detail - do you have time?
>> My proposal here is to basically take the current github NumPy data-structure and make this a sub-type (in C) of the NumPy 1.6 data-structure which is unchanged in NumPy 1.7.
>> This would not require removing code but would require another PyTypeObject and associated structures. I expect Mark could do this work in 2-4 weeks. We also have other developers who could help in order to get the sub-type in NumPy 1.7. What kind of details would you like to see?
> I was dimly thinking of the same questions that Chuck had - about how
> subclassing would relate to the ufunc changes.
Basically, there are two sets of changes as far as I understand right now:
1) ufunc infrastructure understands masked arrays
2) ndarray grew attributes to represent masked arrays
I am proposing that we keep 1) but change 2) so that only certain kinds of NumPy arrays actually have the extra function pointers (effectively a sub-type). In essence, what I'm proposing is that the NumPy 1.6 PyArrayObject become a base-object, but the other members of the C-structure are not even present unless the Masked flag is set. Such changes would not require ripping code out --- just altering the presentation a bit. Yet, they could have large long-term implications, that we should explore before they get fixed.
Whether masked arrays should be a formal sub-class is actually an un-related question and I generally lean in the direction of not encouraging sub-classes of the ndarray. The big questions are does this object work in the calculation infrastructure. Can I add an array to a masked array. Does it have a sum method? I think it could be argued that a masked array does have a "is a" relationship with an array. It can also be argued that it is better to have a "has a" relationship with an array and be-it's own-object. Either way, this object could still have it's first-part be binary compatible with a NumPy Array, and that is what I'm really suggesting.
>> I just think we need more data and uses and this would provide a way to get that without making a forced decision one way or another.
> Is the proposal that this would be an alternative API to numpy.ma?
> Is numpy.ma not itself satisfactory as a test of these uses, because
> of performance or some other reason?
>>>>> 2) Will likely changes to the masked array API make any difference to
>>>>> the number of extra pointers? Likely answer no?
>>>>> Is that right?
>>>> The answer to this is very likely no on the Python side. But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not).
>>>>> I have the impression that the masked array API discussion still has
>>>>> not come out fully into the unforgiving light of discussion day, but
>>>>> if the answer to 2) is No, then I suppose the API discussion is not
>>>>> relevant to the 3 pointers change.
>>>> You are correct that the API discussion is separate from this one. Overall, I was surprised at how fervently people would oppose ABI changes. As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made. I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy. But, that is the current climate.
>>> The objectors object to any binary ABI change, but not specifically
>>> three pointers rather than two or one?
>> Adding pointers is not really an ABI change (but removing them after they were there would be...) It's really just the addition of data to the NumPy array structure that they aren't going to use. Most of the time it would not be a real problem (the number of use-cases where you have a lot of small NumPy arrays is small), but when it is a problem it is very annoying.
>>> Is their point then about ABI breakage? Because that seems like a
>>> different point again.
>> Yes, it's not that.
>>> Or is it possible that they are in fact worried about the masked array API?
>> I don't think most people whose opinion would be helpful are really tuned in to the discussion at this point. I think they just want us to come up with an answer and then move forward. But, they will judge us based on the solution we come up with.
>>>> Mark and I will talk about this long and hard. Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution. If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for.
>>> I started writing something about this but I guess you'd know what I'd
>>> write, so I only humbly ask that you consider whether it might be
>>> doing real damage to allow substantial discussion that is not
>>> documented or argued out in public.
>> It will be documented and argued in public. We are just going to have one off-list conversation to try and speed up the process. You make a valid point, and I appreciate the perspective. Please speak up again after hearing the report if something is not clear. I don't want this to even have the appearance of a "back-room" deal.
>> Mark and I will have conversations about NumPy while he is in Austin. There are many other active stake-holders whose opinions and views are essential for major changes. Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change. I'm not sure if that helps. Is there more we can do?
> As you might have heard me say before, my concern is that it has not
> been easy to have good discussions on this list. I think the problem
> has been that is has not been clear what the culture was, and how
> decisions got made, and that had led to some uncomfortable and
> unhelpful discussions. My plea would be for you as BDF$N to strongly
> encourage on-list discussions and discourage off-list discussions as
> far as possible, and to help us make the difficult public effort to
> bash out the arguments to clarity and consensus. I know that's a big
> See you,
> NumPy-Discussion mailing list
More information about the NumPy-Discussion