[Numpy-discussion] Copy vs View for array[array] (was Histograms via indirect index arrays)

Tim Hochberg tim.hochberg at cox.net
Fri Mar 17 15:16:06 CST 2006

Rick White wrote:

>On Fri, 17 Mar 2006, Tim Hochberg wrote:
>>In theory I'm all for view semantics for an array indexed by an array
>>(I'm sure we have a good name for that, but it's escaping me). Indexing
>>in numpy can be confusing enough without some indexing operations
>>returning views and others copies. This is orthogonal to any issues of
>>In practice, I'm a bit skeptical. The result would need to be some sort
>>of psuedo array object (similar to array.flat). Operations on this
>>object would necessarily have worse performance than operations on a
>>normal array due to the added level of indirection. In some
>>circumstances it would also hold onto a lot of memory that might
>>otherwise be freed since it hold a reference to the data for both the
>>original array and the index array.
>Actually I think it is worse than that -- it seems to me that it
>actually has to make a *copy* of the index array.  I don't think
>that we would want to keep only a reference to the index array,
>since if it changed then the view could respond by changing in very
>unexpected ways.  That sounds like a nightmare side-effect to me.
Yeah, that would be bad. Conceivably (there's that word again) one could 
implement copy-on-write for the index arrays, but that would be another 
can of worms. Anwway, I agree that you would have to at least fake a 
copy somehow.

>That's what has always made me think that this is not a good idea,
>even if the bookkeeping of carrying around an unevaluated array+indices
>could be worked out efficiently.  In my applications I sometimes
>use very large index arrays, and I don't want to have to copy them
>unnecessarily.  Generally I much prefer instant evaluation as in
>the current implementation, since that uses the minimum of memory.
>For what it's worth, IDL behaves exactly like the current numpy:
>a[idx] += 1 increments each element by 1 regardless of how many
>times a particular index is included in idx.
I'm not much concerned with case myself. The case that bothers me (more 
in the abstract than in reality since I don't use index arrays much) is 
the following:

 >>> idx1 = slice(0,None,2)
 >>> idx2 = [0,2,4,6,8,10]
 >>> idx1 = slice(0,None,2)
 >>> idx2 = [0,2,4,6,8]
 >>> a = arange(10)
 >>> b = arange(10)
 >>> ai = a[idx1]
 >>> bi = b[idx2]
 >>> ai
array([0, 2, 4, 6, 8])
 >>> bi
array([0, 2, 4, 6, 8])
 >>> ai[1] = bi[1] = 99
 >>> a
array([ 0,  1, 99,  3,  4,  5,  6,  7,  8,  9])
 >>> b
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The fact that 'a' and 'b' diverge here is a wart, although not one I'm 
sure it's worth doing anything about.



More information about the Numpy-discussion mailing list