[SciPy-User] fast small matrix multiplication with cython?

josef.pktd@gmai... josef.pktd@gmai...
Mon Dec 6 18:33:13 CST 2010


On Mon, Dec 6, 2010 at 5:34 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
> I'm wondering if anyone might have a look at my cython code that does
> matrix multiplication and see where I can speed it up or offer some
> pointers/reading.  I'm new to Cython and my knowledge of C is pretty
> basic based on trial and (mostly) error, so I am sure the code is
> still very naive.
>
> import numpy as np
> from matmult import dotAB, multAB
>
> A = np.array([[ 1.,  3.,  4.],
>                   [ 5.,  6.,  3.]])
> B = A.T.copy()
>
> timeit dotAB(A,B)
> # 1 loops, best of 3: 826 ms per loop
>
> timeit multAB(A,B)
> # 1 loops, best of 3: 1.16 s per loop
>
> As you can see my multAB results in a negative speedup of about .75.
>
> I compile the cython code with
>
> cython -a matmult.pyx
> gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing
> -I/usr/include/python2.6
> -I/usr/local/lib/python2.6/dist-packages/numpy/core/include/ -o
> matmult.so matmult.c
>
> Cython code is attached and inlined below.
>
> Profile is here (some of which I don't understand why there are
> bottlenecks) http://eagle1.american.edu/~js2796a/matmult/matmult.html
> -----------------------------------------------------------
>
> from numpy cimport float64_t, ndarray, NPY_DOUBLE, npy_intp
> cimport cython
> from numpy import dot
>
> ctypedef float64_t DOUBLE
>
> cdef extern from "numpy/arrayobject.h":
>    cdef void import_array()
>    cdef object PyArray_SimpleNew(int nd, npy_intp *dims, int typenum)
>
> import_array()
>
> @cython.boundscheck(False)
> @cython.wraparound(False)
> cdef inline object matmult(ndarray[DOUBLE, ndim=2, mode='c'] A,
>                    ndarray[DOUBLE, ndim=2, mode='c'] B):
>    cdef int lda = A.shape[0]
>    cdef int n = B.shape[1]
>    cdef npy_intp *dims = [lda, n]
>    cdef ndarray[DOUBLE, ndim=2] out = PyArray_SimpleNew(2, dims, NPY_DOUBLE)
>    cdef int i,j,k
>    cdef double s
>    for i in xrange(lda):
>        for j in xrange(n):
>            s = 0
>            for k in xrange(A.shape[1]):
>                s += A[i,k] * B[k,j]
>            out[i,j] = s
>    return out
>
> def multAB(ndarray[DOUBLE, ndim=2] A, ndarray[DOUBLE, ndim=2] B):
>    for i in xrange(1000000):
>        C = matmult(A,B)
>    return C

Does this generate c code, since it's not a cdef ? (I haven't updated
cython in a while.)

I guess you would want to have the entire loop in c.


Josef

>
> def dotAB(ndarray[DOUBLE, ndim=2] A, ndarray[DOUBLE, ndim=2] B):
>    for i in xrange(1000000):
>        C = dot(A,B)
>    return C
>
> Skipper
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


More information about the SciPy-User mailing list