# [SciPy-User] How to efficiently do dot(dot( A.T, diag(d) ), A ) ?

Hugh Perkins hughperkins@gmail....
Tue Sep 11 23:56:55 CDT 2012

```(pressed send by accident too soon, sorry :-(

The good news is: if I use larger matrices, which is actually closer
to my target scenario, then it seems the scipy with openblas is *very*
good on a
dual-core machine:

matlab:

>> n = 200000;
k = 100;
a = spdiags(rand(n,1),0,n,n);
c = rand(k,n);
tic, d = c*a*c'; toc
tic, d = c*a*c'; toc

Elapsed time is 1.322931 seconds.
Elapsed time is 1.308237 seconds.

python:

python testmult.py
Elapsed time: 0.408543109894
Elapsed time: 1.15053105354

However, on a 12-core machine, it seems that scipy/openblas is rather
less competitive:

matlab:

>> n = 200000;
>> k = 100;
>> a = spdiags(rand(n,1),0,n,n);
>> c = rand(k,n);
>> tic, d = c*a; toc
Elapsed time is 0.106401 seconds.
>> tic, d = d*c'; toc
Elapsed time is 0.098882 seconds.

python:

\$ python testmult2.py
Elapsed time: 0.168606996536
Elapsed time: 0.584333896637

python code:

from __future__ import division
from scipy import *
import scipy.sparse as sparse
import time

start = 0
def tic():
global start
start = time.time()

def toc():
global start
print "Elapsed time: " + str( time.time() - start )
start = time.time()

n = 200000;
k = 100;
a = sparse.spdiags( rand(n), 0, n, n )
c = rand(n,k);

tic(); d = c.T * a; toc()
tic(); e = dot( d, c ); toc()

Note that openblas seems to use multithreading by default:

\$ python testmult2.py
Elapsed time: 0.145273923874
Elapsed time: 0.750689983368

\$ python testmult2.py
Elapsed time: 0.168558120728
Elapsed time: 0.578655004501

\$ python testmult2.py
Elapsed time: 0.168849945068
Elapsed time: 0.591191053391

... but this is apparently not enough to get quite as good as matlab :-(

Looking at the ratio of the times for 12 threads and 1 thread, it
looks like the openblas performance doesn't change much with the