[Numpy-discussion] Making numpy sensible: backward compatibility please

Gael Varoquaux gael.varoquaux@normalesup....
Sat Sep 29 05:43:32 CDT 2012


Hi Nathaniel,

First of all, thanks for your investment in numpy. You have really been
moving the project forward lately.

On Sat, Sep 29, 2012 at 03:03:01AM +0100, Nathaniel Smith wrote:
> > I have just fixed a fairly nasty bug in scikit-learn that was
> > introduced by change of semantics in ordering when doing copies with
> > numpy.

> It looks like this is a bug caused by the 1.7 pre-release versions? Have you
> reported it?

I just found this bug yesterday morning, because a user reported a bug in
scikit-learn. I wrote this email after fixing our bug using the
"order='K'" option of np.copy. I think that we didn't find this problem
because all the core developers of scikit-learn are very careful to pass
in their arrays in the right ordering to avoid copies. That said, you
have a point that this also reveals a failure in our test suite: we don't
test systematically for fortran and C ordered inputs. We probably should.

> The same_kind rule change has been reverted for 1.7 for exactly this reason,

Sorry, I haven't been following well enough. I think that this is
probably a good idea. I would vote for warnings to be raised (maybe it is
the case), and maybe in the long term (2.0) relying on the same_kind
rule.

> And we've been trying to be more rigorous about following a formal
> deprecation schedule in general.

Yes, I think that formal deprecation schedules are important. We try to
do the same in scikit-learn. It's a pain and as developers we have to
force ourselves, but it's useful for users.

> I have mixed feelings about the .base change.

I like it. I think that it's useful. I just think that it's implications
are not fully understood yet, and that new mechanisms need to be offered
to replace what it changed.

> all the breakages have been in code that was IMHO already on thin ice.

But that served useful usecases.

> And from my point of view it wouldn't be the *most* terrible thing if
> the result here is that you're forced to make memmap pickling work in
> numpy directly for everybody ;-).

I am not sure that I understand your sentence here.

Actually, getting off topic here, but would people be interested in
discussing a technical solution to our problem:
http://mail.scipy.org/pipermail/numpy-discussion/2012-September/063985.html
i.e. finding the filename and offset of an array inheriting for memmapped
memory, when such a filename exists.

I am ready to put in effort and send in a patch, but before writing such
a patch, I'd like to have some consensus of core developers on an
acceptable solution.

> > For instance, the 'order' parameter added to np.copy could have
> > defaulted to the old behavior, 'K', for a year, with a
> > DeprecationWarning, same thing for the casting rules.

> Maybe it still can, but you have to tell us details :-)

Well, I would think that having a default "order='K'" in np.copy, and
adding such an argument to ndarray.copy would avoid breakage.

> In general numpy development just needs more people keeping track of
> these things. If you want to keep an open source stack functional
> sometimes you have to pay a tax of your time to making sure the levels
> below you will continue to suit your needs.

I partly agree. I think that it goes both ways. I think that downstream
needs to follow upstream, which I do, by running the development tree on
my work desktop. Downstream needs to push bugs and difficulties upstream.
On the other hand, upstream developers should think in terms of impact
and deprecation. When somebody (I can't remember whether it was Joseph
Perktold or Christophe Gohlke) ran different downstream package with the
an RC of numpy or scipy a few months ago, that was terribly useful.

I can offer only limited help, as my schedule is way too packed. On the
one hand I may appear as useless to the community because I spend a lot
of time in meetings, managing students, or writing grant proposals or
papers, instead of following the technical developments. On the other
hand, such activities enable my to hire people to work on open source
software, to have students that invest time on those technologies, and to
have the scipy ecosystem be accepted by the 'establishment'.

To avoid having the discussion going in circles, what I can suggest in
concrete terms with my limited time available is:

 - I can invest time on the mmap issue, and work with core numpy
   developers on a patch
 - I suggest that np.copy and ndarray.copy should take an order='K' as
   default (ndarray.copy does not have such a keyword argument).

Let's make numpy 1.7 rock!

Cheers,

Gaël

PS: rereading my previous mail in the thread, I found that it was full of
sentences that did not make grammatical sens. I apologize for this. It
may look like I am writing my mails hastily, but in fact I am very
slightly dyslexic, and I tend not to see missing words or letters.



More information about the NumPy-Discussion mailing list