[Numpy-discussion] Calling scipy blas from cython is extremely slow
Sergio Callegari
sergio.callegari@gmail....
Sat Feb 23 09:33:14 CST 2013
Hi,
following the excellent advice of V. Armando Sole, I have finally succeeded in
calling the blas routines shipped with scipy from cython.
I am doing this to avoid shipping an extra blas library for some project of
mine that uses scipy but has some things coded in cython for extra speed.
So far I managed getting things working on Linux. Here is what I do:
The following code snippet gives me the dgemv pointer (which is a pointer to a
fortran function, even if it comes from scipy.linalg.blas.cblas, weird).
from cpython cimport PyCObject_AsVoidPtr
import scipy as sp
__import__('scipy.linalg.blas')
ctypedef void (*dgemv_ptr) (char *trans, int *m, int *n,\
double *alpha, double *a, int *lda, double *x,\
int *incx,\
double *beta, double *y, int *incy)
cdef dgemv_ptr dgemv=<dgemv_ptr>PyCObject_AsVoidPtr(\
sp.linalg.blas.cblas.dgemv._cpointer)
Then, in a tight loop, I can call dgemv by first defining the constants
and then calling dgemv inside the loop
cdef int one=1
cdef double onedot = 1.0
cdef double zerodot = 0.0
cdef char trans = 'N'
for i in xrange(N):
dgemv(&trans, &nq, &order,\
&onedot, <double *>np.PyArray_DATA(C), &order, \
<double*>np.PyArray_DATA(c_x0), &one, \
&zerodot, <double*>np.PyArray_DATA(y0), &one)
It works, but it is many many times slower than linking to the cblas that is
available on the same system. Specifically, I have about 8 calls to blas in my
tight loop, 4 of them are to dgemv and the others are to dcopy. Changing a
single dgemv call from the system cblas to the blas function returned by
scipy.linalg.blas.cblas.dgemv._cpointer makes the execution time of a test case
jump from about 0.7 s to 1.25 on my system.
Any clue about why is this happening?
In the end, on linux, scipy dynamically link to atlas exactly as I link to
atlas when I use the cblas functions.
More information about the NumPy-Discussion
mailing list