[Numpy-discussion] Good way to develop numpy as popular choice!
Benjamin Root
ben.root@ou....
Fri Jun 22 10:05:19 CDT 2012
On Fri, Jun 22, 2012 at 10:25 AM, Travis Oliphant <travis@continuum.io>wrote:
> Accessing individual elements of NumPy arrays is slower than accessing
> individual elements of lists --- around 2.5x-3x slower. NumPy has to do
> more work to figure out what kind of indexing you are trying to do because
> of its flexibility. It also has to create the Python object to return.
> In contrast, the list approach already has the Python objects created and
> you are just returning pointers to them and there is much less flexibility
> in the kinds of indexing you can do.
>
> Simple timings show that a.item(i,j) is about 2x slower than list element
> access (but faster than a[i,j] which is about 2.5x to 3x slower). The
> slowness of a.item is due to the need to create the Python object to return
> (there are just raw bytes there) so it gives some idea of the relative cost
> of each part of the slowness of a[i,j].
>
> Also, math on the array scalars returned from NumPy will be slower than
> math on integers and floats --- because NumPy re-uses the ufunc machinery
> which is not optimized at all for scalars.
>
> The take-away is that NumPy is built for doing vectorized operations on
> bytes of data. It is not optimized for doing element-by-element
> individual access. The right approach there is to just use lists (or use
> a version specialized for the kind of data in the lists that removes the
> boxing and unboxing).
>
> Here are my timings using IPython for NumPy indexing:
>
> 1-D:
>
> In[2]: a = arange(100)
>
> In [3]: %timeit [a.item(i) for i in xrange(100)]
> 10000 loops, best of 3: 25.6 us per loop
>
> In [4]: %timeit [a[i] for i in xrange(100)]
> 10000 loops, best of 3: 31.8 us per loop
>
> In [5]: al = a.tolist()
>
> In [6]: %timeit [al[i] for i in xrange(100)]
> 100000 loops, best of 3: 10.6 us per loop
>
>
>
> 2-D:
>
> In [7]: a = arange(100).reshape(10,10)
>
> In [8]: al = a.tolist()
>
> In [9]: %timeit [al[i][j] for i in xrange(10) for j in xrange(10)]
> 10000 loops, best of 3: 18.6 us per loop
>
> In [10]: %timeit [a[i,j] for i in xrange(10) for j in xrange(10)]
> 10000 loops, best of 3: 44.4 us per loop
>
> In [11]: %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)]
> 10000 loops, best of 3: 34.2 us per loop
>
>
>
> -Travis
>
>
However, what is the timing/memory cost of converting a large numpy array
that already exists into python list of lists? If all my processing before
the munkres step is using NumPy, converting it into python lists has a
cost. Also, your timings indicate only ~2x slowdown, while the timing
tests done by eat show an order-of-magnitude difference. I suspect there
is great room for improvement before even starting to worry about the array
access issues.
Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120622/55020bfe/attachment-0001.html
More information about the NumPy-Discussion
mailing list