[Numpy-discussion] Array concatenation performance

Sturla Molden sturla@molden...
Fri Jul 16 06:42:55 CDT 2010


Christopher Barker skrev:
>
> However, Python lists hold python objects, so it's bit inefficient, at 
> least in terms of memory use.
>
I guess that is mainly in terms of memory use, as the Python (scalar) 
objects must be created for the call to append. np.array([]) can also be 
inefficient, as Anne explained yesterday, but an appendable ndarray 
would not have that problem. But I don't know how important it is to 
save a few milliseconds worth of cpu time for this. At least not enough 
to convince me to spend my time on it. The memory issue is more 
important though. But arrays with O(1) amortized append are memory 
inefficient on their own. So it might make more sence to save a list of 
equally long short buffers (e.g. ndarrays), and flatten to a contigous 
buffer when needed. At least that is what I have been doing when needing 
to gradually build up an ndarray. The worst case memory use will be like 
this, amortizing the Python object overhead for large N:

list of ndarrays:
   N*elsize    when building up
   2*N*elsize  when compacting
   N*elsize    when used after compacting

appendable array:
   2*N*elsize  when building up
   2*N*elsize  when used before compacting
   3*N*elsize  when compacting
   N*elsize    when used after compacting

Personally I am not convinced about an appendable array, neither speed 
nor memory wise, I just wanted to hear what other people think. And 
there is one more thing that makes me sceptical: We cannot use realloc() 
to grow an appendable array. That is because it would segfault if there 
are array views in use (the views might suddenly get a dangling data 
pointer). And prohibiting realloc(), the performance of an appendable 
array might not be fantastic (still O(1) on average though). But as Anne 
said, this has probably been discussed in details before.

Sturla


More information about the NumPy-Discussion mailing list