[Numpy-discussion] Good way to develop numpy as popular choice!
Fri Jun 22 09:25:40 CDT 2012
Accessing individual elements of NumPy arrays is slower than accessing individual elements of lists --- around 2.5x-3x slower. NumPy has to do more work to figure out what kind of indexing you are trying to do because of its flexibility. It also has to create the Python object to return. In contrast, the list approach already has the Python objects created and you are just returning pointers to them and there is much less flexibility in the kinds of indexing you can do.
Simple timings show that a.item(i,j) is about 2x slower than list element access (but faster than a[i,j] which is about 2.5x to 3x slower). The slowness of a.item is due to the need to create the Python object to return (there are just raw bytes there) so it gives some idea of the relative cost of each part of the slowness of a[i,j].
Also, math on the array scalars returned from NumPy will be slower than math on integers and floats --- because NumPy re-uses the ufunc machinery which is not optimized at all for scalars.
The take-away is that NumPy is built for doing vectorized operations on bytes of data. It is not optimized for doing element-by-element individual access. The right approach there is to just use lists (or use a version specialized for the kind of data in the lists that removes the boxing and unboxing).
Here are my timings using IPython for NumPy indexing:
In: a = arange(100)
In : %timeit [a.item(i) for i in xrange(100)]
10000 loops, best of 3: 25.6 us per loop
In : %timeit [a[i] for i in xrange(100)]
10000 loops, best of 3: 31.8 us per loop
In : al = a.tolist()
In : %timeit [al[i] for i in xrange(100)]
100000 loops, best of 3: 10.6 us per loop
In : a = arange(100).reshape(10,10)
In : al = a.tolist()
In : %timeit [al[i][j] for i in xrange(10) for j in xrange(10)]
10000 loops, best of 3: 18.6 us per loop
In : %timeit [a[i,j] for i in xrange(10) for j in xrange(10)]
10000 loops, best of 3: 44.4 us per loop
In : %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)]
10000 loops, best of 3: 34.2 us per loop
On Jun 22, 2012, at 8:48 AM, Benjamin Root wrote:
> On Fri, Jun 22, 2012 at 9:42 AM, eat <firstname.lastname@example.org> wrote:
> On Fri, Jun 22, 2012 at 7:51 AM, Gael Varoquaux <email@example.com> wrote:
> On Thu, Jun 21, 2012 at 08:59:09PM -0400, Benjamin Root wrote:
> > > munkres seems to be a pure python implementation ;-).
> > Oops! I could have sworn that I once tried one named munkres that used
> > numpy. But that was several years ago.
> > There is a development branch of sk-learn with an implementation of the
> > hungarian assignment solver using numpy. It will even do non-square
> > matrices and matrices with an empty dimension.
> Yes, absolutely, thanks to Ben:
> I never merged this in the main scikit-learn tree, because munkres is not
> used so far. Maybe I should merge it in the main tree, or maybe it should
> be added to scipy or numpy.
> I made some simple timing comparisons (see attached picture) between numpy based hungarian and pure python shortest path based hungarian_sp. It seems that pure python based implementation outperforms numpy based implementation. Timings are averaged over five runs.
> The difference cannot totally be explained by different algorithms (although shortest path based seem to scale better). Rather the heavy access to rows and columns seem to favor list of lists. So this type of algorithms may indeed be real challenges for numpy.
> Thanks for that analysis. Personally, I never needed high-performance so I never bothered to optimize it. However, it does appear that there is an order-of-magnitude difference between the two, and so it might be worth it to see what can be done to fix that.
> Ben Root
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion