[Numpy-discussion] array copy-to-self and views
Thu Feb 1 12:52:13 CST 2007
> Zachary Pincus wrote:
>> Hello folks,
>> I recently was trying to write code to modify an array in-place (so
>> as not to invalidate any references to that array)
> I'm not sure what this means exactly.
Say one wants to keep two different variables referencing a single in-
memory list, as so:
a = [1,2,3]
b = a
Now, if 'b' and 'a' go to live in different places (different class
instances or whatever) but we want 'b' and 'a' to always refer to the
same in-memory object, so that 'id(a) == id(b)', we need to make sure
to not assign a brand new list to either one.
That is, if we do something like 'a = [i + 1 for i in a]' then 'id
(a) != id(b)'. However, we can do 'a[:] = [i + 1 for i in a]' to
modify a in-place. This is not super-common, but it's also not an
uncommon python idiom.
I was in my email simply pointing out that naïvely translating that
idiom to the numpy case can cause unexpected behavior in the case of
I think that this is is unquestionably a bug -- isn't the point of
views that the user shouldn't need to care if a particular array
object is a view or not? Given the lack of methods to query whether
an array is a view, or what it might be a view on, this seems like a
reasonable perspective... I mean, if certain operations produce
completely different results when one of the operands is a view, that
*seems* like a bug. It might not be worth fixing, but I can't see how
that behavior would be considered a feature.
However, I do think there's a legitimate question about whether it
would be worth fixing -- there could be a lot of complicated checks
to catch these kind of corner cases.
>> via the standard
>> python idiom for lists, e.g.:
>> a[:] = numpy.flipud(a)
>> Now, flipud returns a view on 'a', so assigning that to 'a[:]'
>> provides pretty strange results as the buffer that is being read (the
>> view) is simultaneously modified.
> yes, weird. So why not just:
> a = numpy.flipud(a)
> Since flipud returns a view, the new "a" will still be using the same
> data array. Does this satisfy your need above?
Nope -- though 'a' and 'numpy.flipud(a)' share the same data, the
actual ndarray instances are different. This means that any other
references to the 'a' array (made via 'b = a' or whatever) now refer
to the old 'a', not the flipped one.
The only other option for sharing arrays is to encapsulate them as
attributes of *another* object, which itself won't change. That seems
a bit clumsy.
> It's too bad that to do this you need to know that flipud created a
> view, rather than a copy of the data, as if it were a copy, you would
> need to do the a[:] trick to make sure a kept the same data, but
> the price we pay for the flexibility and power of numpy -- the
> alternative is to have EVERYTHING create a copy, but there were be a
> substantial performance hit for that.
Well, Anne's email suggests another alternative -- each time a view
is created, keep track of the original array from whence it came, and
then only make a copy when collisions like the above would take place.
And actually, I suspect that views already need to keep a reference
to their original array in order to keep that array from being
deleted before the view is. But I don't know the guts of numpy well
enough to say for sure.
> NOTE: the docstring doesn't make it clear that a view is created:
> Help on function flipud in module numpy.lib.twodim_base:
> returns an array with the columns preserved and rows flipped in
> the up/down direction. Works on the first dimension of m.
> NOTE2: Maybe these kinds of functions should have an optional flag
> specified whether you want a view or a copy -- I'd have expected a
> in this case!
Well, it seems like in most cases one does not need to care whether
one is looking at a view or an array. The only time that comes to
mind is when you're attempting to modify the array in-place, e.g.
a[<something>] = <something else>
Even if the maybe-bug above were easily fixable (again, not sure
about that), you might *still* want to be able to figure out if a
were a view before such a modification. Whether this needs a runtime
'is_view' method, or just consistent documentation about what returns
a view, isn't clear to me. Certainly the latter couldn't hurt.
> How do you tell if two arrays are views on the same data: is
> checking if
> they have the same .base reliable?
>>>> a = numpy.array((1,2,3,4))
>>>> b = a.view()
>>>> a.base is b.base
> No, I guess not. Maybe .base should return self if it's the originator
> of the data.
> Is there a reliable way? I usually just test by changing a value in
> to see if it changes in the other, but that's one heck of kludge!
>>>> a.__array_interface__['data'] == b.__array_interface__['data']
> seems to work, but that's pretty ugly!
Good question. As I mentioned above, I assume that this information
is tracked internally to prevent the 'original' array data from being
deleted before any views have; however I really don't know how it is
More information about the Numpy-discussion