[SciPy-User] How to efficiently do dot(dot( A.T, diag(d) ), A ) ?

Tiziano Zito opossumnano@gmail....
Tue Sep 18 04:47:06 CDT 2012

> > However, OpenBLAS/GotoBLAS could be better than ATLAS, it seems to be
> > also doing well in the benchmarks you linked to:
> >
> > If you are on Linux, you can easily swap the BLAS libraries used, like so:
> Ah great!  This is 4 times faster for me:
> Using atlas:
> $ python testmult.py
> Elapsed time: 0.0221469402313
> Elapsed time: 0.21438908577
> Using goto/openblas:
> $ python testmult.py
> Elapsed time: 0.0214130878448
> Elapsed time: 0.051687002182

What ATLAS package are you using? If you are on Debian/Ubuntu, the
default libatlas3-base is *not* optimized for your CPU. With ATLAS
most optimizations happen at build time, so to take full advantage
of ATLAS you *need* to compile it on the machine you are going to
use it. The binary package you get from Ubuntu/Debian has been
compiled on the Debian developer machine and is not going to be good
for yours. 
You need to follow the (very simple) instructions found in
/usr/share/doc/libatlas3-base/README.Debian to compile ATLAS on your
CPU, so that ATLAS has actually a chance to optimize. In my
experience this can make ATLAS a lot (up to 10x) faster.

For the lazy, here are the instructions:

# cd /tmp
# apt-get source atlas
# apt-get build-dep atlas
# cd atlas-3.8.4
# fakeroot debian/rules custom
# cd ..

this will produce a series of deb packages that you can install with 

# dpkg -i *.deb


More information about the SciPy-User mailing list