[SciPy-user] How to start with SciPy and NumPy
Sturla Molden
sturla@molden...
Mon Jan 26 08:52:37 CST 2009
> Sturla Molden wrote:
> This is independent of the list implementation, isn't it ? I am quite
> curious to understand how you could get O(1) complexity for a "growable"
> container: if you don't know in advance the number of items, and you add
> O(N) items, how come you can get O(1) complexity ?
Each time the array is re-sized, you add in some extra empty slots. Make
sure the number of extra slots is proportional to the size of the array.
That is:
Worst-case complexity when calling realloc: O(N), i.e. allocating a new
buffer, copy over the data. Time to new realloc: O(N), i.e k*N empty
slots. The worst case complexity for re-sizes using realloc is on average
O(N)/O(N) = O(1). Between realloc's the empty slots are used, so these
appends are O(1). I.e. appends to the growable array is O(1) on average.
This is referred to as "amortized O(1) complexity". The advantage of this
over linked lists is that elements will be stored om a cache coherent
manner. But in both cases the interface to the user will be that of a list
(growable container). Python lists do this, C++ std::vector do this, etc.
It is handy to know of this strategy for NumPy as well; e.g. if you want
to write an ndarray subclass that can grow and shrink dynamically.
> This may be true for one dimensional array, but generally, I think numpy
> array performances come a lot from any item being reachable directly
> from its 'coordinates' (this plus using native types instead of python
> objects of course).
An ndarrays data is a "one-dimensional array" of bytes.
If you jump back and forth using arr[i,j] the speed will depend on the
size of the array. If it is to big to fit in cache this may be slow, if it
is small enough this will be fast.
On the other hand, if you iterate over arr[i,j] in an ordered manner, it
will be fast because the elements are stored cache coherently. That is, if
the array is stored in C order, it is a big chance that arr[i,j+1] will be
in cache when you have retrived arr[i,j].
It is not the coordinates that gives you the speed, it is how the data are
cached by the processor.
Regards,
S. Molden
More information about the SciPy-user
mailing list