[Numpy-discussion] Making numpy sensible: backward compatibility please

Nathaniel Smith njs@pobox....
Fri Sep 28 21:03:01 CDT 2012


On Fri, Sep 28, 2012 at 10:23 PM, Gael Varoquaux <
gael.varoquaux@normalesup.org> wrote:
> Hi numpy developers,
>
> First of all, thanks a lot for the hard work you put in numpy. I know
> very well that maintaining such a core library is a lot of effort and a
> service to the community. But "with great dedication, comes great
> responsibility" :).

There've been several long discussions about this on numpy-discussion over
the last few months, actually... a few that I remember off the top of my
head:

http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062496.html
http://www.mail-archive.com/numpy-discussion@scipy.org/msg37500.html
http://mail.scipy.org/pipermail/numpy-discussion/2012-May/thread.html#62298

> I find that Numpy is a bit of a wild horse, a moving target. I have just
> fixed a fairly nasty bug in scikit-learn [1] that was introduced by
> change of semantics in ordering when doing copies with numpy. I have been
> running working and developing the scikit-learn while tracking numpy's
> development tree and, as far as I can tell, I never saw warnings raised
> in our code that something was going to change, or had changed.

It looks like this is a bug caused by the 1.7 pre-release versions? Have
you reported it? (I swear I saw some code go by recently that involved
guessing array orders to match the input, but I can't recall where.)

> In other settings, changes in array inheritance and 'base' propagation
> have made impossible some of our memmap-related usecase that used to work
> under previous numpy [2]. Other's have been hitting difficulties related
> to these changes in behavior [3]. Not to mention the new casting rules
> (default: 'same_kind') that break a lot of code, or the ABI change that,
> while not done an purpose, ended up causing us a lot of pain.

The same_kind rule change has been reverted for 1.7 for exactly this
reason, and several dozen changes have gone in in the last month or two
trying to clear up all the little regressions we've found so far in 1.7pre.
And we've been trying to be more rigorous about following a formal
deprecation schedule in general.

https://github.com/numpy/numpy/pull/440
https://github.com/numpy/numpy/pull/451
https://github.com/numpy/numpy/pull/280
https://github.com/numpy/numpy/pull/350

I have mixed feelings about the .base change. If it were possible to do a
deprecation period I'd definitely be in favor, but I don't see how, unless
we were to remove accessing it from python altogether, and that's pretty
unlikely. The problem is it's an attractive nuisance; the *only* reliable
thing you've *ever* been able to do with it is pin an object in memory when
constructing an array directly, but people keep expecting more, so all the
breakages have been in code that was IMHO already on thin ice. And from my
point of view it wouldn't be the *most* terrible thing if the result here
is that you're forced to make memmap pickling work in numpy directly for
everybody ;-). But I go back and forth in my own mind, because of the
things you say. Other ideas very welcome. (Maybe we should rename the
python attribute to ._base - with appropriate deprecation period of course
- just to encourage people to stop doing unwise things with it, and *then*
make the change that's tripping you up now?)

> My point here is that having code that works and gives correct results
> with new releases of numpy is more challenging that it should be. I
> cannot claim that I disagree with the changes that I mention above. They
> were all implemented for a good reason and can all be considered as
> overall improvements to numpy. However the situation is that given a
> complex codebase relying on numpy that works at a time t, the chances
> that it works flawlessly at time t + 1y are thin. I am not too proud that
> we managed to release scikit-learn 0.12 with a very ugly bug under numpy
> 1.7. That happened although we have 90% of test coverage, buildbots under
> different numpy versions, and a lot of people, including me, using our
> development tree on a day to day basis with bleeding edge numpy. Most
> code in research settings or RD industry does not benefit from such
> software engineering and I believe is much more likely to suffer from
> changes in numpy.
>
> I think that this is a cultural issue: priority is not given to stability
> and backward compatibility. I think that this culture is very much
> ingrained in the Python world, that likes iteratively cleaning its
> software design. For instance, I have the feeling that in the
> scikit-learn, we probably fall in the same trap. That said, such a
> behavior cannot fare well for a base scientific environment. People tell
> me that if they take old matlab code, the odds that it will still works
> is much higher than with Python code. As a geek, I tend to reply that we
> get a lot out of this mobility, because we accumulate less cruft.
> However, in research settings, for reproducibility reasons, ones need to
> be able to pick up an old codebase and trust its results without knowing
> its intricacies.
>
> >From a practical standpoint, I believe that people implementing large
> changes to the numpy codebase, or any other core scipy package, should
> think really hard about their impact. I do realise that the changes are
> discussed on the mailing lists, but there is a lot of activity to follow
> and I don't believe that it is possible for many of us to monitor the
> discussions. Also, putting more emphasis on backward compatibility is
> possible. For instance, the 'order' parameter added to np.copy could have
> defaulted to the old behavior, 'K', for a year, with a
> DeprecationWarning, same thing for the casting rules.

Maybe it still can, but you have to tell us details :-)

In general numpy development just needs more people keeping track of these
things. If you want to keep an open source stack functional sometimes you
have to pay a tax of your time to making sure the levels below you will
continue to suit your needs.

-n

> Thank you for reading this long email. I don't mean it to be a complaint
> about the past, but more a suggestion on something to keep in mind when
> making changes to core projects.
>
> Cheers,
>
> Gaël
>
> ____
>
> [1]
https://github.com/scikit-learn/scikit-learn/commit/7842748cf777412c506a8c0ed28090711d3a3783
>
> [2]
http://mail.scipy.org/pipermail/numpy-discussion/2012-September/063985.html
>
> [3] http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063126.html
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120929/cdc68050/attachment-0001.html 


More information about the NumPy-Discussion mailing list