[Numpy-discussion] copy on demand

Alexander Schmolck a.schmolck at gmx.net
Mon Jun 17 08:12:03 CDT 2002


Konrad Hinsen <hinsen at cnrs-orleans.fr> writes:

[Konrad wants to keep alias-slicing behavior for backward-compatibility]
> > I sympathize with this view. However, I think the solution to this problem
> > should be a compatibility wrapper rather than a design compromise.
> > 
> > There are at least 2 reasons why:
> > 
> > 1. Numarray has quite a few incompatibilities to Numeric anyway, so even
> >    without this change you'd be forced to rewrite all or most of those scripts
> 
> The question is how much effort it is to update code. If it is easy,
> most people will do it sooner or later. If it is difficult, they won't.
> And that will lead to a split in the user community, which I think
> is highly detrimental to the further development of NumPy and Numarray.

I agree that avoiding a split of the Numeric user community is a crucial issue
and that efforts have to be taken to make transition painless enough to happen
(in most cases; maybe it needs to be even 90% or more as you say).

> 
> A compatibility wrapper won't change this. Assume that I have tons of
> code that I can't update because it's too much effort. Instead I use
> the compatbility wrapper. When I add a line or a function to that
> code, it will of course stick to the old conventions. When I add a new
> module, I will also prefer the old conventions, for consistency. And
> other people working with the code will pick up the old conventions as
> well. At the same time, other people will use the new conventions.
> There will be two parts of the community that cannot easily read each
> other's code.

I don't think the situation is quite so bleak. Yes, library code should be
converted, and although a compatibility wrapper might be helpful in the
process, I agree that it isn't a full solution for the reasons you cite above.

But there is plenty of code that is mainly used internally and no longer
changes (much), for which I think a compatibility wrapper is a fine solution
(and might be preferable to conversion, even if it involves little effort). If
I had some matlab (or C) code that fulfills similar criteria, I'd also rather
wrap it somehow rather than to convert it to python.

> 
> So unless we can reach a concensus that will guarantee that 90% of
> existing code will be adapted to the new interfaces, there will be a
> split.
> 
> >    (or use the wrapper), but none of the incompatibilities I'm currently aware
> >    of would, in my eyes, buy one as much as introducing copy-indexing
> >    semantics would. So if things get broken anyway, one might as well take
> 
> I agree, but it also comes at the highest cost. There is absolute no
> way to identify automatically the code that needs to be adapted, and
> there is no run-time error message in case of failure - just a wrong
> result. None of the other proposed changes is as risky as this one.

Wouldn't an (almost) automatic solution be to simply replace (almost) all
instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual
cases (like if you heavily mix arrays and lists) you could still autoconvert
by inserting ``if type(foo) == ArrayType:...``, although this would admittedly
be rather messy.  The unnecessary ``.view``s can be eliminated over time and
even if they aren't, no one would have to learn or switch between two
libraries.

> 
> >    this step (especially since intentional views are, on the whole, used
> >    rather sparingly -- although tracking down these uses in retrospect might
> >    admittedly be unpleasant).
> 
> It is not merely unpleasant, the cost is simply prohibitive.

See above. I personally hope that even without resorting to something like the
above, converting my code to copy behavior wouldn't be too much of an effort,
but my code-base is much smaller than yours and I can't currently recall more
than one case of intended aliasing that would require a couple of changes and
my estimate might also prove quite wrong. I have no idea which scenario is
typical.

> 
> > 2. Numarray is supposed to be incorporated into the core. Compromising the
> >    consistency of core python (and code that depends on it) is in my eyes
> >    worse than compromising code written for Numeric.
> 
> I don't see view behaviour as inconsistent with Python. Python has one
> mutable sequence type, the list, with copy behaviour. One type is
> hardly enough to establish a rule.

Well, AFAIK there are actually three mutable sequence types in python core and
all have copy-slicing behavior: list, UserList and array:

    >>> import array
    >>> aa = array.array('d', [1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
    >>> bb = aa[:]
    >>> bb is aa
    0

I would suppose that in the grand scheme of things numarray.array is intended
as an eventual replacement for array.array, or not?

Furthermore list is such a fundamental data type in python that I think it is
actually enough to establish a rule (if the vast majority of 3rd party modules
sequence types don't have the same semantics, I'd regard it as a strong
argument for your position, but I haven't checked).

> 
> > As a third reason I could claim that there is some hope of a much more
> > widespread adoption of Numeric/numarray as an alternative to matlab etc. in
> > the next couple of years, so that it might be wise to fix things now, but I'd
> > understand if you'd remain unimpressed by that :)
> 
> I'd like to see any supporting evidence. I think this argument is
> based on the reasoning "I would prefer it to be this way, so many
> others would certainly also prefer it, so they would start using NumPy
> if only these changes were made." This is not how decision processes
> work in real life.

Sure, but I didn't try to imply this causality anyway:) My argument wasn't so
much "lets make it really good (where good is what *I* say) then loads of
people will adopt it", it was more: "Numeric has a good chance to grow
considerably in popularity over the next years, so it will be much easier to
fix things now than later" (for slicing behavior, now is likely to be the last
chance).

The fact that matlab users are used to copy-on-demand and the fact that many
people, (including you if I understand you correctly) think that copy-slicing
semantics as such (without backward compatibility concerns) are preferable,
might have a small influence on people's decision to adopt Numeric, but I
perfectly agree that this influence will be minor compared to other issues.

> 
> On the contrary, people might look at the history of NumPy and decide
> that it is too unreliable to base a serious project on - if they
> changed the interface once, they might do it again. This is a
> particularly important aspect in the OpenSource universe, where there
> are no contracts that promise anything. If you want people to use your

I don't think matlab or similar alternatives make legally binding promises
about backwards compatibility, or do they? It guess it is actually more
difficult to *force* incompatible changes on people with an open source
project than with commercial software, but I agree that splitting or
lighthearted sacrifices of backwards compatibility are more of a temptation
with open source, for one thing because there are usually less financial
stakes involved for the authors.

> code, you have to demonstrate that it is reliable, and that applies to
> both the code and the interfaces.

Yes, this is very important and I very much appreciate that you stress these
and similar points in your postings.

But reliability to me also includes the ability for growth -- I not only want
my old code to work in a couple of years, I also want the tool I wrote it in
to remain competitive and this can conflict with backwards-compatibility. I
like the balance python strikes here so far -- the language has improved
significantly (and in my eyes has remained superior to newer competitors such
as ruby) but at the same time for me and most other people transitions between
versions haven't caused too much trouble. This increases the value of my
code-base to me: I can assume that it will still work (or be adapted without
too much effort) in years to come and yet be written in an excellent language
for the job.

Striking this balance is however quite difficult (as can be seen by the heated
discussions in c.l.p), so getting it right will most likely involve
considerable effort (and controversy) within the Numeric community.

alex

-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.Schmolck at gmx.net     http://www.dcs.ex.ac.uk/people/aschmolc/





More information about the Numpy-discussion mailing list