[Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

Travis Oliphant travis@continuum...
Mon Apr 16 22:40:53 CDT 2012


>> 
>> I think the answer to this is yes, but it could be as a feature-filled sub-class (like the current numpy.ma, except in C).
> 
> I'd love to hear that argument fleshed out in more detail - do you have time?


My proposal here is to basically take the current github NumPy data-structure and make this a sub-type (in C) of the NumPy 1.6 data-structure which is unchanged in NumPy 1.7.   

This would not require removing code but would require another PyTypeObject and associated structures.  I expect Mark could do this work in 2-4 weeks.   We also have other developers who could help in order to get the sub-type in NumPy 1.7.     What kind of details would you like to see? 

In this way, the masked-array approach to missing data could be pursued by those who prefer that approach without affecting any other users of numpy arrays (and the numpy.ma sub-class could be deprecated).     I would also like to add missing-data dtypes (ideally before NumPy 1.7, but it is not a requirement of release). 

I just think we need more data and uses and this would provide a way to get that without making a forced decision one way or another. 

> 
>>> 2) Will likely changes to the masked array API make any difference to
>>> the number of extra pointers?  Likely answer no?
>>> 
>>> Is that right?
>> 
>> The answer to this is very likely no on the Python side.  But, on the C-side, their could be some differences (i.e. are masked arrays a sub-class of the ndarray or not).
>> 
>>> 
>>> I have the impression that the masked array API discussion still has
>>> not come out fully into the unforgiving light of discussion day, but
>>> if the answer to 2) is No, then I suppose the API discussion is not
>>> relevant to the 3 pointers change.
>> 
>> You are correct that the API discussion is separate from this one.     Overall,  I was surprised at how fervently people would oppose ABI changes.   As has been pointed out, NumPy and Numeric before it were not really designed to prevent having to recompile when changes were made.   I'm still not sure that a better overall solution is not to promote better availability of downstream binary packages than excessively worry about ABI changes in NumPy.    But, that is the current climate.
> 
> The objectors object to any binary ABI change, but not specifically
> three pointers rather than two or one?

Adding pointers is not really an ABI change (but removing them after they were there would be...)  It's really just the addition of data to the NumPy array structure that they aren't going to use.  Most of the time it would not be a real problem (the number of use-cases where you have a lot of small NumPy arrays is small), but when it is a problem it is very annoying. 

> 
> Is their point then about ABI breakage?  Because that seems like a
> different point again.

Yes, it's not that. 

> 
> Or is it possible that they are in fact worried about the masked array API?

I don't think most people whose opinion would be helpful are really tuned in to the discussion at this point.  I think they just want us to come up with an answer and then move forward.    But, they will judge us based on the solution we come up with. 

> 
>> Mark and I will talk about this long and hard.  Mark has ideas about where he wants to see NumPy go, but I don't think we have fully accounted for where NumPy and its user base *is* and there may be better ways to approach this evolution.    If others are interested in the outcome of the discussion please speak up (either on the list or privately) and we will make sure your views get heard and accounted for.
> 
> I started writing something about this but I guess you'd know what I'd
> write, so I only humbly ask that you consider whether it might be
> doing real damage to allow substantial discussion that is not
> documented or argued out in public.

It will be documented and argued in public.     We are just going to have one off-list conversation to try and speed up the process.    You make a valid point, and I appreciate the perspective.     Please speak up again after hearing the report if something is not clear.   I don't want this to even have the appearance of a "back-room" deal.     

Mark and I will have conversations about NumPy while he is in Austin.   There are many other active stake-holders whose opinions and views are essential for major changes.    Mark and I are working on other things besides just NumPy and all NumPy changes will be discussed on list and require consensus or super-majority for NumPy itself to change.     I'm not sure if that helps.   Is there more we can do? 

Thanks, 

-Travis



> 
> See you,
> 
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list