[Numpy-discussion] Array concatenation performance
Sturla Molden
sturla@molden...
Fri Jul 16 06:42:55 CDT 2010
Christopher Barker skrev:
>
> However, Python lists hold python objects, so it's bit inefficient, at
> least in terms of memory use.
>
I guess that is mainly in terms of memory use, as the Python (scalar)
objects must be created for the call to append. np.array([]) can also be
inefficient, as Anne explained yesterday, but an appendable ndarray
would not have that problem. But I don't know how important it is to
save a few milliseconds worth of cpu time for this. At least not enough
to convince me to spend my time on it. The memory issue is more
important though. But arrays with O(1) amortized append are memory
inefficient on their own. So it might make more sence to save a list of
equally long short buffers (e.g. ndarrays), and flatten to a contigous
buffer when needed. At least that is what I have been doing when needing
to gradually build up an ndarray. The worst case memory use will be like
this, amortizing the Python object overhead for large N:
list of ndarrays:
N*elsize when building up
2*N*elsize when compacting
N*elsize when used after compacting
appendable array:
2*N*elsize when building up
2*N*elsize when used before compacting
3*N*elsize when compacting
N*elsize when used after compacting
Personally I am not convinced about an appendable array, neither speed
nor memory wise, I just wanted to hear what other people think. And
there is one more thing that makes me sceptical: We cannot use realloc()
to grow an appendable array. That is because it would segfault if there
are array views in use (the views might suddenly get a dangling data
pointer). And prohibiting realloc(), the performance of an appendable
array might not be fantastic (still O(1) on average though). But as Anne
said, this has probably been discussed in details before.
Sturla
More information about the NumPy-Discussion
mailing list