[Numpy-discussion] array copy-to-self and views

Zachary Pincus zpincus@stanford....
Thu Feb 1 12:52:13 CST 2007


> Zachary Pincus wrote:
>> Hello folks,
>>
>> I recently was trying to write code to modify an array in-place (so
>> as not to invalidate any references to that array)
>
> I'm not sure what this means exactly.

Say one wants to keep two different variables referencing a single in- 
memory list, as so:
a = [1,2,3]
b = a
Now, if 'b' and 'a' go to live in different places (different class  
instances or whatever) but we want 'b' and 'a' to always refer to the  
same in-memory object, so that 'id(a) == id(b)', we need to make sure  
to not assign a brand new list to either one.

That is, if we do something like 'a = [i + 1 for i in a]' then 'id 
(a) != id(b)'. However, we can do 'a[:] = [i + 1 for i in a]' to  
modify a in-place. This is not super-common, but it's also not an  
uncommon python idiom.

I was in my email simply pointing out that naïvely translating that  
idiom to the numpy case can cause unexpected behavior in the case of  
views.

I think that this is is unquestionably a bug -- isn't the point of  
views that the user shouldn't need to care if a particular array  
object is a view or not? Given the lack of methods to query whether  
an array is a view, or what it might be a view on, this seems like a  
reasonable perspective... I mean, if certain operations produce  
completely different results when one of the operands is a view, that  
*seems* like a bug. It might not be worth fixing, but I can't see how  
that behavior would be considered a feature.

However, I do think there's a legitimate question about whether it  
would be worth fixing -- there could be a lot of complicated checks  
to catch these kind of corner cases.

>> via the standard
>> python idiom for lists, e.g.:
>>
>> a[:] = numpy.flipud(a)
>>
>> Now, flipud returns a view on 'a', so assigning that to 'a[:]'
>> provides pretty strange results as the buffer that is being read (the
>> view) is simultaneously modified.
>
> yes, weird. So why not just:
>
> a = numpy.flipud(a)
>
> Since flipud returns a view, the new "a" will still be using the same
> data array. Does this satisfy your need above?

Nope -- though 'a' and 'numpy.flipud(a)' share the same data, the  
actual ndarray instances are different. This means that any other  
references to the 'a' array (made via 'b = a' or whatever) now refer  
to the old 'a', not the flipped one.

The only other option for sharing arrays is to encapsulate them as  
attributes of *another* object, which itself won't change. That seems  
a bit clumsy.

> It's too bad that to do this you need to know that flipud created a
> view, rather than a copy of the data, as if it were a copy, you would
> need to do the a[:] trick to make sure a kept the same data, but  
> that's
> the price we pay for the flexibility and power of numpy -- the
> alternative is to have EVERYTHING create a copy, but there were be a
> substantial performance hit for that.

Well, Anne's email suggests another alternative -- each time a view  
is created, keep track of the original array from whence it came, and  
then only make a copy when collisions like the above would take place.

And actually, I suspect that views already need to keep a reference  
to their original array in order to keep that array from being  
deleted before the view is. But I don't know the guts of numpy well  
enough to say for sure.

> NOTE: the docstring doesn't make it clear that a view is created:
>
>>>> help(numpy.flipud)
> Help on function flipud in module numpy.lib.twodim_base:
>
> flipud(m)
>      returns an array with the columns preserved and rows flipped in
>      the up/down direction.  Works on the first dimension of m.
>
> NOTE2: Maybe these kinds of functions should have an optional flag  
> that
> specified whether you want a view or a copy -- I'd have expected a  
> copy
> in this case!

Well, it seems like in most cases one does not need to care whether  
one is looking at a view or an array. The only time that comes to  
mind is when you're attempting to modify the array in-place, e.g.
a[<something>] = <something else>

Even if the maybe-bug above were easily fixable (again, not sure  
about that), you might *still* want to be able to figure out if a  
were a view before such a modification. Whether this needs a runtime  
'is_view' method, or just consistent documentation about what returns  
a view, isn't clear to me. Certainly the latter couldn't hurt.

> QUESTION:
> How do you tell if two arrays are views on the same data: is  
> checking if
> they have the same .base reliable?
>
>>>> a = numpy.array((1,2,3,4))
>>>> b = a.view()
>>>> a.base is b.base
> False
>
> No, I guess not. Maybe .base should return self if it's the originator
> of the data.
>
> Is there a reliable way? I usually just test by changing a value in  
> one
> to see if it changes in the other, but that's one heck of kludge!
>
>>>> a.__array_interface__['data'][0] == b.__array_interface__['data'] 
>>>> [0]
> True
>
> seems to work, but that's pretty ugly!

Good question. As I mentioned above, I assume that this information  
is tracked internally to prevent the 'original' array data from being  
deleted before any views have; however I really don't know how it is  
exposed.

Zach


More information about the Numpy-discussion mailing list