[SciPy-User] How to efficiently do dot(dot( A.T, diag(d) ), A ) ?

Pauli Virtanen pav@iki...
Tue Sep 11 13:21:10 CDT 2012

11.09.2012 20:28, Hugh Perkins kirjoitti:
> It makes me wonder though.  There is an opensource project called
> 'Eigen', for C++.
> It seems to provide good performance for matrix-matrix multiplication,
> comparable to Intel MKL, and significantly better than ublas
> http://eigen.tuxfamily.org/index.php?title=Benchmark  I'm not sure
> what the relationship is between ublas and BLAS?

Eigen doesn't provide a BLAS interface, so it would be quite a lot of
work to use it.

Moreover, it probably derives some of its speed for small matrices from
compile-time specialization, which is not available via a BLAS interface.

However, OpenBLAS/GotoBLAS could be better than ATLAS, it seems to be
also doing well in the benchmarks you linked to:


If you are on Linux, you can easily swap the BLAS libraries used, like so:

*** OpenBLAS:

LD_PRELOAD=/usr/lib/openblas-base/libopenblas.so.0 ipython
In [11]: %timeit e = np.dot(d, c.T)
100 loops, best of 3: 14.8 ms per loop

*** ATLAS:

LD_PRELOAD=/usr/lib/atlas-base/atlas/libblas.so.3gf ipython
In [12]: %timeit e = np.dot(d, c.T)
10 loops, best of 3: 20.8 ms per loop

*** Reference BLAS:

LD_PRELOAD=/usr/lib/libblas/libblas.so.3gf:/usr/lib/libatlas.so ipython
In [11]: %timeit e = np.dot(d, c.T)
10 loops, best of 3: 89.3 ms per loop

Yet another thing to watch out is possible use of multiple processors at
once (although I'm not sure how much that will matter in this particular

Pauli Virtanen

More information about the SciPy-User mailing list