[Numpy-discussion] Making numpy sensible: backward compatibility please
Fri Sep 28 16:43:00 CDT 2012
Thank you for expressing this voice, Gael. It is an important perspective. The main reason that 1.7 has taken so long to get released is because I'm concerned about these kinds of changes and really want to either remove them or put in adequate warnings prior to moving forward.
It's a long and complex process. Thanks for providing feedback when you encounter problems so that we can do our best to address them. I agree that we should be much more cautious about semantic changes in the 1.X series of NumPy. How we handle situations where 1.6 changed things from 1.5 and wasn't reported until now is an open question and depends on the particular problem in question. I agree that we should be much more cautious about changes (particularly semantic changes that will break existing code).
On Sep 28, 2012, at 4:23 PM, Gael Varoquaux wrote:
> Hi numpy developers,
> First of all, thanks a lot for the hard work you put in numpy. I know
> very well that maintaining such a core library is a lot of effort and a
> service to the community. But "with great dedication, comes great
> responsibility" :).
> I find that Numpy is a bit of a wild horse, a moving target. I have just
> fixed a fairly nasty bug in scikit-learn  that was introduced by
> change of semantics in ordering when doing copies with numpy. I have been
> running working and developing the scikit-learn while tracking numpy's
> development tree and, as far as I can tell, I never saw warnings raised
> in our code that something was going to change, or had changed.
> In other settings, changes in array inheritance and 'base' propagation
> have made impossible some of our memmap-related usecase that used to work
> under previous numpy . Other's have been hitting difficulties related
> to these changes in behavior . Not to mention the new casting rules
> (default: 'same_kind') that break a lot of code, or the ABI change that,
> while not done an purpose, ended up causing us a lot of pain.
> My point here is that having code that works and gives correct results
> with new releases of numpy is more challenging that it should be. I
> cannot claim that I disagree with the changes that I mention above. They
> were all implemented for a good reason and can all be considered as
> overall improvements to numpy. However the situation is that given a
> complex codebase relying on numpy that works at a time t, the chances
> that it works flawlessly at time t + 1y are thin. I am not too proud that
> we managed to release scikit-learn 0.12 with a very ugly bug under numpy
> 1.7. That happened although we have 90% of test coverage, buildbots under
> different numpy versions, and a lot of people, including me, using our
> development tree on a day to day basis with bleeding edge numpy. Most
> code in research settings or RD industry does not benefit from such
> software engineering and I believe is much more likely to suffer from
> changes in numpy.
> I think that this is a cultural issue: priority is not given to stability
> and backward compatibility. I think that this culture is very much
> ingrained in the Python world, that likes iteratively cleaning its
> software design. For instance, I have the feeling that in the
> scikit-learn, we probably fall in the same trap. That said, such a
> behavior cannot fare well for a base scientific environment. People tell
> me that if they take old matlab code, the odds that it will still works
> is much higher than with Python code. As a geek, I tend to reply that we
> get a lot out of this mobility, because we accumulate less cruft.
> However, in research settings, for reproducibility reasons, ones need to
> be able to pick up an old codebase and trust its results without knowing
> its intricacies.
>> From a practical standpoint, I believe that people implementing large
> changes to the numpy codebase, or any other core scipy package, should
> think really hard about their impact. I do realise that the changes are
> discussed on the mailing lists, but there is a lot of activity to follow
> and I don't believe that it is possible for many of us to monitor the
> discussions. Also, putting more emphasis on backward compatibility is
> possible. For instance, the 'order' parameter added to np.copy could have
> defaulted to the old behavior, 'K', for a year, with a
> DeprecationWarning, same thing for the casting rules.
> Thank you for reading this long email. I don't mean it to be a complaint
> about the past, but more a suggestion on something to keep in mind when
> making changes to core projects.
>  https://github.com/scikit-learn/scikit-learn/commit/7842748cf777412c506a8c0ed28090711d3a3783
>  http://mail.scipy.org/pipermail/numpy-discussion/2012-September/063985.html
>  http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063126.html
> NumPy-Discussion mailing list
More information about the NumPy-Discussion