# [Numpy-discussion] speeding up the following expression

Geoffrey Zhu zyzhu2000@gmail....
Sat Nov 12 22:07:44 CST 2011

```Hi Warren,

On Sat, Nov 12, 2011 at 9:31 AM, Warren Weckesser
<warren.weckesser@enthought.com> wrote:
>
>
> On Sat, Nov 12, 2011 at 6:43 AM, <josef.pktd@gmail.com> wrote:
>>
>> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu <zyzhu2000@gmail.com> wrote:
>> > Hi,
>> >
>> > I am playing with multiple ways to speed up the following expression
>> > (it is in the inner loop):
>> >
>> >
>> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
>> >
>> > where C is an array of about 200-300 elements, M=len(C), a, b, c are
>> > scalars.
>> >
>> > I played with numexpr, but it was way slower than directly using
>> > numpy. It would be nice if I could create a Mx3 matrix without copying
>> > memory and so I can use dot() to calculate the whole thing.
>> >
>> > Can anyone help with giving some advices to make this faster?
>>
>> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
>> conditions.
>
>
>
> As Josef pointed out, this is a convolution.  There are (at least)
> three convolution functions in numpy+scipy that you could use:
> numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
> Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
> beat the simple expression
>    a*x[2:] + b*x[1:-1] + c*x[:-2]
> Your idea of forming a matrix without copying memory can be done using
> "stride tricks", and for arrays of the size you are interested in, it
> computes the result faster than the simple expression (see below).
>
> Another fast alternative is to use one of the inline code generators.
> This example is a easy to implement with scipy.weave.blitz, and it gives
> a big speedup.
>
> Here's a test script:
>
> #----- convolve1dtest.py -----
>
>
> import numpy as np
> from numpy.lib.stride_tricks import as_strided
> from scipy.ndimage import convolve1d
> from scipy.weave import blitz
>
> # weighting coefficients
> a = -0.5
> b = 1.0
> c = -0.25
> w = np.array((a,b,c))
> # Reversed w:
> rw = w[::-1]
>
> # Length of C
> n = 250
>
> # The original version of the calculation:
> # Some dummy data
> C = np.arange(float(n))
> C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> # Save for comparison.
> C0 = C.copy()
>
> # Do it again using a matrix multiplication.
> C = np.arange(float(n))
> # The "virtual" matrix view of C.
> V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0]))
> C[1:-1] = np.dot(V, rw)
> C1 = C.copy()
>
> # Again, with convolve1d this time.
> C = np.arange(float(n))
> C[1:-1] = convolve1d(C, w)[1:-1]
> C2 = C.copy()
>
> # scipy.weave.blitz
> C = np.arange(float(n))
> # Must work with a copy, D, in the formula, because blitz does not use
> # a temporary variable.
> D = C.copy()
> expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
> blitz(expr, check_size=0)
> C3 = C.copy()
>
>
> # Verify that all the methods give the same result.
> print np.all(C0 == C1)
> print np.all(C0 == C2)
> print np.all(C0 == C3)
>
> #-----
>
> And here's a snippet from an ipython session:
>
> In [51]: run convolve1dtest.py
> True
> True
> True
>
> In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> 100000 loops, best of 3: 16.5 us per loop
>
> In [53]: %timeit C[1:-1] = np.dot(V, rw)
> 100000 loops, best of 3: 9.84 us per loop
>
> In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> 100000 loops, best of 3: 18.7 us per loop
>
> In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
> 100000 loops, best of 3: 4.91 us per loop
>
>
>
> scipy.weave.blitz is fastest (but note that blitz has already been called
> once, so the time shown does not include the compilation required in
> the first call).  You could also try scipy.weave.inline, cython.inline,
> or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
>
> Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
> expression or convolve1d.  However, if you also have to set up V inside
> your inner loop, the speed gain will be lost.  The relative speeds also
> depend on the size of C.  For large C, the simple expression is faster
> than the matrix multiplication by V (but blitz is still faster).  In
> the following, I have changed n to 2500 before running convolve1dtest.py:
>
> In [56]: run convolve1dtest.py
> True
> True
> True
>
> In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> 10000 loops, best of 3: 29.5 us per loop
>
> In [58]: %timeit C[1:-1] = np.dot(V, rw)
> 10000 loops, best of 3: 56.4 us per loop
>
> In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> 10000 loops, best of 3: 37.3 us per loop
>
> In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
> 100000 loops, best of 3: 10.3 us per loop
>
>
> blitz wins, the simple numpy expression is a distant second, and now
> the matrix multiplication is slowest.
>
> I hope that helps--I know I learned quite a bit. :)
>
>
> Warren

Thanks for the very helpful response. Fortunately I don't have to set
up the matrix every time as I can work on the same C over and over
again inside the loop. It's counterintuitive to see that the
performance of dot() degrades as n goes up. If I didn't see the
results, I would have guessed otherwise.

Thanks again!

Geoffrey
```