[SciPy-User] fast small matrix multiplication with cython?
Tue Dec 7 10:37:34 CST 2010
On Tue, Dec 7, 2010 at 3:51 AM, Dag Sverre Seljebotn
> On 12/07/2010 07:56 AM, Fernando Perez wrote:
>> Hi Skipper,
>> On Mon, Dec 6, 2010 at 2:34 PM, Skipper Seabold<email@example.com> wrote:
>>> I'm wondering if anyone might have a look at my cython code that does
>>> matrix multiplication and see where I can speed it up or offer some
>>> pointers/reading. I'm new to Cython and my knowledge of C is pretty
>>> basic based on trial and (mostly) error, so I am sure the code is
>>> still very naive.
>> a few years ago I had a similar problem, and I ended up getting a very
>> significant speedup by hand-coding a very unsafe, but very fast pure C
>> extension just to compute these inner products. This was basically a
>> replacement for dot() that would only work with double precision
>> inputs of compatible dimensions and would happily segfault with
>> anything else, but it ran very fast. The inner loop is implemented
>> completely naively, but it still beats calls to BLAS (even linked with
>> ATLAS) for small matrix dimensions (my case was also up to ~ 15x15).
> Another idea: If the matrices are more in the intermediate range, here's
> a Cython library for calling BLAS more directly:
I actually tried to use tokyo, but I couldn't get it to build against
the ATLAS I compiled a few days ago out of the box. A few changes to
setup.py didn't fix it, so I gave up.
> For intermediate-size matrices the use of SSE instructions should be
> able to offset any call overhead. Try to stay clear of using NumPy for
> slicing though, instead one should do pointer arithmetic...
More information about the SciPy-User