[Numpy-discussion] Copy vs View for array[array] (was Histograms via indirect index arrays)
Tim Hochberg
tim.hochberg at cox.net
Fri Mar 17 15:16:06 CST 2006
Rick White wrote:
>On Fri, 17 Mar 2006, Tim Hochberg wrote:
>
>
>
>>In theory I'm all for view semantics for an array indexed by an array
>>(I'm sure we have a good name for that, but it's escaping me). Indexing
>>in numpy can be confusing enough without some indexing operations
>>returning views and others copies. This is orthogonal to any issues of
>>performance.
>>
>>In practice, I'm a bit skeptical. The result would need to be some sort
>>of psuedo array object (similar to array.flat). Operations on this
>>object would necessarily have worse performance than operations on a
>>normal array due to the added level of indirection. In some
>>circumstances it would also hold onto a lot of memory that might
>>otherwise be freed since it hold a reference to the data for both the
>>original array and the index array.
>>
>>
>
>Actually I think it is worse than that -- it seems to me that it
>actually has to make a *copy* of the index array. I don't think
>that we would want to keep only a reference to the index array,
>since if it changed then the view could respond by changing in very
>unexpected ways. That sounds like a nightmare side-effect to me.
>
>
Yeah, that would be bad. Conceivably (there's that word again) one could
implement copy-on-write for the index arrays, but that would be another
can of worms. Anwway, I agree that you would have to at least fake a
copy somehow.
>That's what has always made me think that this is not a good idea,
>even if the bookkeeping of carrying around an unevaluated array+indices
>could be worked out efficiently. In my applications I sometimes
>use very large index arrays, and I don't want to have to copy them
>unnecessarily. Generally I much prefer instant evaluation as in
>the current implementation, since that uses the minimum of memory.
>
>For what it's worth, IDL behaves exactly like the current numpy:
>a[idx] += 1 increments each element by 1 regardless of how many
>times a particular index is included in idx.
>
>
I'm not much concerned with case myself. The case that bothers me (more
in the abstract than in reality since I don't use index arrays much) is
the following:
>>> idx1 = slice(0,None,2)
>>> idx2 = [0,2,4,6,8,10]
>>> idx1 = slice(0,None,2)
>>> idx2 = [0,2,4,6,8]
>>> a = arange(10)
>>> b = arange(10)
>>> ai = a[idx1]
>>> bi = b[idx2]
>>> ai
array([0, 2, 4, 6, 8])
>>> bi
array([0, 2, 4, 6, 8])
>>> ai[1] = bi[1] = 99
>>> a
array([ 0, 1, 99, 3, 4, 5, 6, 7, 8, 9])
>>> b
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
The fact that 'a' and 'b' diverge here is a wart, although not one I'm
sure it's worth doing anything about.
-tim
>
>
>
More information about the Numpy-discussion
mailing list