[Numpy-discussion] copy on demand

Alexander Schmolck a.schmolck at gmx.net
Thu Jun 13 17:36:05 CDT 2002


"Perry Greenfield" <perry at stsci.edu> writes:

> I'm not sure what you mean. Are you saying that if anything in the
> buffer changes, force all views of the buffer to generate copies
> (rather than try to determine if the change affected only selected

Yes (I suspect that this will be be sufficient in practice).

> views)? If so, yes, it is easier, but it still is a non-trivial
> capability to implement.

Sure. But since copy-on-demand is only an optimization and as such doesn't
affect the semantics, it could also be implemented at a later point if the
resources are currently not available. I have little doubt that someone will
eventually add copy-on-demand, if the option is kept open and in the meantime
one could still get all the performance (and alias behavior) of the current
implementation by explicitly using ``.view`` (or ``.sub`` if you prefer) to
create aliases.

I'm becoming increasingly convinced (see below) that copy-slicing-semantics
are much to be preferred as the default, so given the above I don't think that
performance concerns should sway one towards alias-slicing, if enough people
feel that copy semantics as such are preferable.

> > >
> > > The bookkeeping can get pretty messy (if you care about memory usage,
> > > which we definitely do).  Consider this case:
> > >
> > >     >>> a = zeros((5000,5000))
> > >     >>> b = a[0:-10,0:-10]
> > >     >>> c = a[49:51,50]
> > >     >>> del a
> > >     >>> b[50,50] = 1
> > >
> > > Now what happens?  Either we can copy the array for b (which means two
> >
> > ``b`` and ``c`` are copied and then ``a`` is deleted.
> >
> > What does numarray currently keep of a if I do something like the
> > above or:
> >
> > >>> b = a.flat[::-10000]
> > >>> del a
> >
> > ?
> >
> The whole buffer remains in both cases.

OK, so this is then a nice example where even eager copy slicing behavior
would be *significantly* more efficient than the current aliasing behavior --
so copy-on-demand would then on the whole seem to be not just nearly equally
but *more* efficient than alias slicing.

And as far as difficult to understand runtime behavior is concerned, the extra
~100MB useless baggage carried around by b (second case) are, I'd venture to
suspect, less than obvious to the casual observer. In fact I remember one of
my fellow phd-students having significant problems with mysterious memory
consumption (a couple of arrays taking up more than 1GB rather than a few
hundred MB) -- maybe something like the above was involved. That
 ``A = A[::-1]`` doesn't work (as pointed out by Paul Barrett) will also come as a
surprise to most people.

If I understand all this correctly, I consider it a rather strong case against
alias slicing as default behavior.


> > > Even keeping track of the views associated with a buffer doesn't solve
> > > the problem of an array that is passed to a C extension and is modified
> > > in place.  It would seem that passing an array into a C extension would
> > > always require all the associated views to be turned into copies.
> > > Otherwise we can't guarantee that views won't be modifed.
> >
> > Yes -- but only if the C extension is destructive. In that case
> > the user might
> > well be making a mistake in current Numeric if he has views and
> > doesn't want
> > them to be modified by the operation (of course he might know
> > that the inplace
> > operation does not affect the view(s) -- but wouldn't such cases be rather
> > rare?). If he *does* want the views to be modified, he would
> > obviously have to
> > explictly specify them as such in a copy-on-demand scheme and in the other
> > case he has been most likely been prevented from making an error (and can
> > still explicitly use real views if he knows that the inplace
> > operation on the
> > original will not have undesired effects on the "views").
> >
> If the point is that views are susceptible to unexpected changes
> made in place by a C extension, yes, certainly (just as they
> are for changes made in place in Python). But I'm not sure what
> that has to do with the implied copy (even if delayed) being
> broken by extensions written in C. Promising a copy, and not
> honoring it is not the same as not promising it in the first
> place. But I may be misunderstanding your point.
> 

OK, I'll try again, hopefully this is clearer.

In a sentence: I don't see any problems with C extensions in particular that
would arise from copy-on-demand (I might well be overlooking something,
though).

Rick was saying that passing an array to a C extension that performs an
inplace operation on it means that all copies of all its (lazy) views must be
performed.

My point was that this is correct, but I can't see any problem with that,
neither from the point of extension writer, nor from the point of performance
nor from the point of the user, nor indeed from the point of the numarray
implementors (obviously the copy-on-demand scheme *as such* will be an
effort).

All that is needed is a separate interface for (the minority of) C extensions
that destructively modify their arguments (they only need to call some
function `actualize_views(the_array_or_view)` or whatever at the start -- this
function will obviously be necessary regardless of the C extensions). So
nothing will break, the promises are kept and no extra work.

It won't be any slower than what would happen with current Numeric, either,
because either the (Numeric) user intended his (aliased) views to modified as
well or it was a bug. If he intended the views to be modified, he would
explicitly use alias-views under the new scheme and everything would behave
exactly the same.



alex
-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.Schmolck at gmx.net     http://www.dcs.ex.ac.uk/people/aschmolc/





More information about the Numpy-discussion mailing list