[SciPy-User] Calling scipy blas from cython is slow and non-deterministic

Sergio Callegari sergio.callegari@gmail....
Sat Feb 23 10:07:26 CST 2013


Hi,

following the excellent advice that I have received from V. Armando Sole
on the numpy mailing list, I have finally succeeded in
calling the blas routines shipped with scipy from cython.

I am doing this to avoid shipping an extra blas library in windows,
for a project of mine that uses scipy but has some things coded in cython
for speed.


So far I managed getting things (almost) working on Linux.  Here is what I do:


The following code snippet gives me the dgemv pointer (which is a pointer to a
fortran function, even if it comes from scipy.linalg.blas.cblas, weird).

from cpython cimport PyCObject_AsVoidPtr
import scipy as sp
__import__('scipy.linalg.blas')

ctypedef void (*dgemv_ptr) (char *trans, int *m, int *n,\
                     double *alpha, double *a, int *lda, double *x,\
                     int *incx,\
                     double *beta,  double *y, int *incy)
cdef dgemv_ptr dgemv=<dgemv_ptr>PyCObject_AsVoidPtr(\
    sp.linalg.blas.cblas.dgemv._cpointer)


Then, in a tight loop, I can call dgemv by first defining some constants
(since fortran stuff wants everything passed by address)

    cdef int one=1
    cdef double onedot = 1.0
    cdef double zerodot = 0.0
    cdef char trans = 'N'


... and then calling dgemv inside the loop


    for i in xrange(N):
        dgemv(&trans, &nq, &order,\
            &onedot, <double *>np.PyArray_DATA(C), &order, \
            <double*>np.PyArray_DATA(c_x0), &one, \
            &zerodot, <double*>np.PyArray_DATA(y0), &one)


This works, sort of. But I have two major issues:

1) It is non deterministic. About 1 times over 6 it gives the wrong result.
Using cblas_dgemv always gives the correct result.  Am I forgetting the
initialization of something?

2) It is much much slower than using the cblas_dgemv that I can get by linking
to atlas.  Specifically, I have about 8 calls to blas in my tight loop, 6 of
them are to dgemv and the others are to dcopy.  Changing a single dgemv call
from the system cblas to the blas function returned by
scipy.linalg.blas.cblas.dgemv._cpointer makes the execution time of a test case
jump from about 0.7 s to 1.25 on my system. This is a huge slow down of 80% for
just a single function!

Any clue about why is this happening? In the end, on linux, scipy's fblas.so is
dynamically linked to atlas exactly like my code when I access the cblas_dgemv. 





More information about the SciPy-User mailing list