[Numpy-discussion] Index Array Performance

Wes McKinney wesmckinn@gmail....
Mon Feb 13 18:46:21 CST 2012


On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith <njs@pobox.com> wrote:
> How would you fix it? I shouldn't speculate without profiling, but I'll be
> naughty. Presumably the problem is that python turns that into something
> like
>
> hist[i,j] = hist[i,j] + 1
>
> which means there's no way for numpy to avoid creating a temporary array. So
> maybe this could be fixed by adding a fused __inplace_add__ protocol to the
> language (and similarly for all the other inplace operators), but that seems
> really unlikely. Fundamentally this is just the sort of optimization
> opportunity you miss when you don't have a compiler with a global view;
> Fortran or c++ expression templates will win every time. Maybe pypy will fix
> it someday.
>
> Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work?
>
> - N

Nope, don't buy it:

In [33]: timeit arr.__iadd__(1)
1000 loops, best of 3: 1.13 ms per loop

In [37]: timeit arr[:] += 1
1000 loops, best of 3: 1.13 ms per loop

- Wes

> On Feb 14, 2012 12:18 AM, "Wes McKinney" <wesmckinn@gmail.com> wrote:
>>
>> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver
>> <m.oliver@jacobs-university.de> wrote:
>> > Hi,
>> >
>> > I have a short piece of code where the use of an index array "feels
>> > right", but incurs a severe performance penalty: It's about an order
>> > of magnitude slower than all other operations with arrays of that
>> > size.
>> >
>> > It comes up in a piece of code which is doing a large number of "on
>> > the fly" histograms via
>> >
>> >  hist[i,j] += 1
>> >
>> > where i is an array with the bin index to be incremented and j is
>> > simply enumerating the histograms.  I attach a full short sample code
>> > below which shows how it's being used in context, and corresponding
>> > timeit output from the critical code section.
>> >
>> > Questions:
>> >
>> > - Is this a matter of principle, or due to an inefficient
>> >  implementation?
>> > - Is there an equivalent way of doing it which is fast?
>> >
>> > Regards,
>> > Marcel
>> >
>> > =================================================================
>> >
>> > #! /usr/bin/env python
>> > # Plot the bifurcation diagram of the logistic map
>> >
>> > from pylab import *
>> >
>> > Nx = 800
>> > Ny = 600
>> > I = 50000
>> >
>> > rmin = 2.5
>> > rmax = 4.0
>> > ymin = 0.0
>> > ymax = 1.0
>> >
>> > rr = linspace (rmin, rmax, Nx)
>> > x = 0.5*ones(rr.shape)
>> > hist = zeros((Ny+1,Nx), dtype=int)
>> > j = arange(Nx)
>> >
>> > dy = ymax/Ny
>> >
>> > def f(x):
>> >    return rr*x*(1.0-x)
>> >
>> > for n in xrange(1000):
>> >    x = f(x)
>> >
>> > for n in xrange(I):
>> >    x = f(x)
>> >    i = array(x/dy, dtype=int)
>> >    hist[i,j] += 1
>> >
>> > figure()
>> >
>> > imshow(hist,
>> >       cmap='binary',
>> >       origin='lower',
>> >       interpolation='nearest',
>> >       extent=(rmin,rmax,ymin,ymax),
>> >       norm=matplotlib.colors.LogNorm())
>> >
>> > xlabel ('$r$')
>> > ylabel ('$x$')
>> >
>> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$')
>> >
>> > show()
>> >
>> > ====================================================================
>> >
>> > In [4]: timeit y=f(x)
>> > 10000 loops, best of 3: 19.4 us per loop
>> >
>> > In [5]: timeit i = array(x/dy, dtype=int)
>> > 10000 loops, best of 3: 22 us per loop
>> >
>> > In [6]: timeit img[i,j] += 1
>> > 10000 loops, best of 3: 119 us per loop
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> This suggests to me that fancy indexing could be quite a bit faster in
>> this case:
>>
>> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop
>> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1)
>> 10000 loops, best of 3: 20.6 us per loop
>>
>> I wrote a simple Cython method
>>
>> def fancy_inc(ndarray[int64_t, ndim=2] values,
>>              ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc):
>>    cdef:
>>        Py_ssize_t i, n = len(iarr)
>>
>>    for i in range(n):
>>        values[iarr[i], jarr[i]] += inc
>>
>> that does even faster
>>
>> In [8]: timeit sbx.fancy_inc(hist, i, j, 1)
>> 100000 loops, best of 3: 4.85 us per loop
>>
>> About 10% faster if bounds checking and wraparound are disabled.
>>
>> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list?
>>
>> - Wes
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list