[SciPy-User] fast small matrix multiplication with cython?

Skipper Seabold jsseabold@gmail....
Tue Dec 7 10:37:34 CST 2010

On Tue, Dec 7, 2010 at 3:51 AM, Dag Sverre Seljebotn
<dagss@student.matnat.uio.no> wrote:
> On 12/07/2010 07:56 AM, Fernando Perez wrote:
>> Hi Skipper,
>> On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold<jsseabold@gmail.com>  wrote:
>>> I'm wondering if anyone might have a look at my cython code that does
>>> matrix multiplication and see where I can speed it up or offer some
>>> pointers/reading.  I'm new to Cython and my knowledge of C is pretty
>>> basic based on trial and (mostly) error, so I am sure the code is
>>> still very naive.
>> a few years ago I had a similar problem, and I ended up getting a very
>> significant speedup by hand-coding a very unsafe, but very fast pure C
>> extension just to compute these inner products.  This was basically a
>> replacement for dot() that would only work with double precision
>> inputs of compatible dimensions and would happily segfault with
>> anything else, but it ran very fast.  The inner loop is implemented
>> completely naively, but it still beats calls to BLAS (even linked with
>> ATLAS) for small matrix dimensions (my case was also up to ~ 15x15).
> Another idea: If the matrices are more in the intermediate range, here's
> a Cython library for calling BLAS more directly:
> http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/

I actually tried to use tokyo, but I couldn't get it to build against
the ATLAS I compiled a few days ago out of the box.  A few changes to
setup.py didn't fix it, so I gave up.

> For intermediate-size matrices the use of SSE instructions should be
> able to offset any call overhead. Try to stay clear of using NumPy for
> slicing though, instead one should do pointer arithmetic...

Right. Thanks.


More information about the SciPy-User mailing list