# [Numpy-discussion] speeding up the following expression

josef.pktd@gmai... josef.pktd@gmai...
Sat Nov 12 11:16:50 CST 2011

```On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser
<warren.weckesser@enthought.com> wrote:
>
>
> On Sat, Nov 12, 2011 at 9:59 AM, <josef.pktd@gmail.com> wrote:
>>
>> On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
>> <warren.weckesser@enthought.com> wrote:
>> >
>> >
>> > On Sat, Nov 12, 2011 at 6:43 AM, <josef.pktd@gmail.com> wrote:
>> >>
>> >> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu <zyzhu2000@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I am playing with multiple ways to speed up the following expression
>> >> > (it is in the inner loop):
>> >> >
>> >> >
>> >> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
>> >> >
>> >> > where C is an array of about 200-300 elements, M=len(C), a, b, c are
>> >> > scalars.
>> >> >
>> >> > I played with numexpr, but it was way slower than directly using
>> >> > numpy. It would be nice if I could create a Mx3 matrix without
>> >> > copying
>> >> > memory and so I can use dot() to calculate the whole thing.
>> >> >
>> >> > Can anyone help with giving some advices to make this faster?
>> >>
>> >> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
>> >> conditions.
>> >
>> >
>> >
>> > As Josef pointed out, this is a convolution.  There are (at least)
>> > three convolution functions in numpy+scipy that you could use:
>> > numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
>> > Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
>> > beat the simple expression
>> >    a*x[2:] + b*x[1:-1] + c*x[:-2]
>> > Your idea of forming a matrix without copying memory can be done using
>> > "stride tricks", and for arrays of the size you are interested in, it
>> > computes the result faster than the simple expression (see below).
>> >
>> > Another fast alternative is to use one of the inline code generators.
>> > This example is a easy to implement with scipy.weave.blitz, and it gives
>> > a big speedup.
>> >
>> > Here's a test script:
>> >
>> > #----- convolve1dtest.py -----
>> >
>> >
>> > import numpy as np
>> > from numpy.lib.stride_tricks import as_strided
>> > from scipy.ndimage import convolve1d
>> > from scipy.weave import blitz
>> >
>> > # weighting coefficients
>> > a = -0.5
>> > b = 1.0
>> > c = -0.25
>> > w = np.array((a,b,c))
>> > # Reversed w:
>> > rw = w[::-1]
>> >
>> > # Length of C
>> > n = 250
>> >
>> > # The original version of the calculation:
>> > # Some dummy data
>> > C = np.arange(float(n))
>> > C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
>> > # Save for comparison.
>> > C0 = C.copy()
>> >
>> > # Do it again using a matrix multiplication.
>> > C = np.arange(float(n))
>> > # The "virtual" matrix view of C.
>> > V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0],
>> > C.strides[0]))
>> > C[1:-1] = np.dot(V, rw)
>> > C1 = C.copy()
>> >
>> > # Again, with convolve1d this time.
>> > C = np.arange(float(n))
>> > C[1:-1] = convolve1d(C, w)[1:-1]
>> > C2 = C.copy()
>> >
>> > # scipy.weave.blitz
>> > C = np.arange(float(n))
>> > # Must work with a copy, D, in the formula, because blitz does not use
>> > # a temporary variable.
>> > D = C.copy()
>> > expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
>> > blitz(expr, check_size=0)
>> > C3 = C.copy()
>> >
>> >
>> > # Verify that all the methods give the same result.
>> > print np.all(C0 == C1)
>> > print np.all(C0 == C2)
>> > print np.all(C0 == C3)
>> >
>> > #-----
>> >
>> > And here's a snippet from an ipython session:
>> >
>> > In [51]: run convolve1dtest.py
>> > True
>> > True
>> > True
>> >
>> > In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
>> > 100000 loops, best of 3: 16.5 us per loop
>> >
>> > In [53]: %timeit C[1:-1] = np.dot(V, rw)
>> > 100000 loops, best of 3: 9.84 us per loop
>> >
>> > In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
>> > 100000 loops, best of 3: 18.7 us per loop
>> >
>> > In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
>> > 100000 loops, best of 3: 4.91 us per loop
>> >
>> >
>> >
>> > scipy.weave.blitz is fastest (but note that blitz has already been
>> > called
>> > once, so the time shown does not include the compilation required in
>> > the first call).  You could also try scipy.weave.inline, cython.inline,
>> > or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
>> >
>> > Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
>> > expression or convolve1d.  However, if you also have to set up V inside
>> > your inner loop, the speed gain will be lost.  The relative speeds also
>> > depend on the size of C.  For large C, the simple expression is faster
>> > than the matrix multiplication by V (but blitz is still faster).  In
>> > the following, I have changed n to 2500 before running
>> > convolve1dtest.py:
>> >
>> > In [56]: run convolve1dtest.py
>> > True
>> > True
>> > True
>> >
>> > In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
>> > 10000 loops, best of 3: 29.5 us per loop
>> >
>> > In [58]: %timeit C[1:-1] = np.dot(V, rw)
>> > 10000 loops, best of 3: 56.4 us per loop
>> >
>> > In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
>> > 10000 loops, best of 3: 37.3 us per loop
>> >
>> > In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
>> > 100000 loops, best of 3: 10.3 us per loop
>> >
>> >
>> > blitz wins, the simple numpy expression is a distant second, and now
>> > the matrix multiplication is slowest.
>> >
>> > I hope that helps--I know I learned quite a bit. :)
>>
>> Interesting, two questions
>>
>> does scipy.signal convolve have a similar overhead as np.convolve1d ?
>
>
> Did you mean np.convolve?  There is no np.convolve1d.   Some of the tests
> that I've done with convolution functions are here:
>     http://www.scipy.org/Cookbook/ApplyFIRFilter
> I should add np.convolve to that page.  For the case considered here,
> np.convolve is a bit slower than scipy.ndimage.convolve1d, but for larger
> arrays, it looks like np.convolve can be much faster.
>
>
>>
>> memory:
>> the blitz code doesn't include the array copy (D), so the timing might
>
>
> Look again at my %timeit calls in the ipython snippets.  :)
>

Sorry, I was reading way too fast or selectively (skipping the
imports, namespaces for example).
(I thought convolve1d was numpy not ndimage)

I tried out your cookbook script a while ago, it's nice, I had only a
rough idea about the timing before.
Compared to the current case the x is much longer, (unless I don't

Thanks,
Josef

>
>>
>> I assume the as_strided call doesn't allocate any memory yet, so the
>> timing should be correct. (or is this your comment about setting up V
>> in the inner loop)
>>
>
>
> Yes, that's what I meant; if V has to be created inside the inner loop (so
> as_strided is called in the loop), the time it takes to create V eliminates
> the benefit of using the matrix approach.
>
> Warren
>
>
>>
>> Josef
>>
>> >
>> > Warren
>> >
>> >
>> >>
>> >> Josef
>> >>
>> >>
>> >> >
>> >> > Thanks,
>> >> > G
>> >> > _______________________________________________
>> >> > NumPy-Discussion mailing list
>> >> > NumPy-Discussion@scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >> >
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
```