[SciPy-dev] sparse matrix multiplication - poor performance

Nathan Bell wnbell at gmail.com
Thu Dec 7 07:22:58 CST 2006

```On 12/7/06, Joachim Dahl <dahl.joachim at gmail.com> wrote:
> Can this be because the matrices are very simple and not stored in CCS or
> CRS
>  format in matlab?  For matrices with very simple structures,  it probably
> makes sense
>  to keep track of that, e.g., to precompute the sparsity pattern of the
> product.
>
>  What happens in your tests for matrices with random sparsity patterns?

That's a possible concern - perhaps MATLAB treats k-diagonal matrices
in a special manner?  However, my experience suggests that this is not
the cause - it's not that MATLAB is necessarily fast, just that scipy
is unacceptably slow.

Below is a test using a random symmetric matrix.  Since the matrix is
symmetric, it's not necessary to test CSR.T and CSC.T since those are
just CSC and CSR respectively.

Note that the CSC*CSR _is_ a bad code path (as Robert suggested there
might be).  However, even in the best of cases, Scipy is still much
much slower.  I thought the resizing might be the culprit, but that
doesn't appear to be the case since CSC*CSC also resizes, but doesn't
take as long as CSC*CSR.

--------------------MATLAB CODE--------------------
N = 1000;
A = speye(N);

for i=1:N
for j=1:5
A(i,floor(N*rand)+1) = 1;
end
end

S = A+A';

nnz(A)
tic; B = A*A; toc;
tic; B = A'*A; toc;
tic; B = A*A'; toc;
--------------------MATLAB OUTPUT--------------------
ans =

5987

Elapsed time is 0.024908 seconds.
Elapsed time is 0.032202 seconds.
Elapsed time is 0.025415 seconds.

--------------------SCIPY CODE--------------------
from scipy import *
import time
N = 1000
A = sparse.lil_matrix((N,N))

for i in range(N):
A[i,i] = 1
for j in range(5):
A[i,int(floor(N*rand()))] = 1

A = A.tocsr()
S = A+A.T #make symmetric
CSR = S.tocsr()
CSC = S.tocsc()

print "nnz S ",S.nnz

start = time.clock()
B = CSR*CSR
print "CSR CSR = ",time.clock() - start

start = time.clock()
B = CSC*CSC
print "CSC CSC = ",time.clock() - start

start = time.clock()
B = CSR*CSC
print "CSR CSC = ",time.clock() - start

start = time.clock()
B = CSC*CSR
print "CSC CSR = ",time.clock() - start

--------------------SCIPY OUTPUT--------------------
nnz S  10960
CSR CSR =  0.8
Resizing... 217 381 21920
Resizing... 386 855 39076
Resizing... 630 536 63036
Resizing... 857 396 86326
Resizing... 980 171 98637
Resizing... 998 836 100593
Resizing... 999 878 100711
Resizing... 999 987 100724
CSC CSC =  0.58
CSR CSC =  0.61
Resizing... 217 381 21920
Resizing... 386 855 39076
Resizing... 630 536 63036
Resizing... 857 396 86326
Resizing... 980 171 98637
Resizing... 998 836 100593
Resizing... 999 878 100711
Resizing... 999 987 100724
CSC CSR =  84.05

-------------------- END OUTPUT--------------------

>  This surprised me:
>  >> N=10000;
>  >> A=speye(N);
>  >> tic, B=A*A; toc
>  Elapsed time is 0.001630 seconds.
>  >> tic, B=3*A; toc
>  Elapsed time is 2.649450 seconds.

That appears to be an aberrant result, I get:

>> A = speye(10000);
>> tic; B = A*A; toc;
Elapsed time is 0.003558 seconds.
>> tic; B = 3*A; toc;
Elapsed time is 0.020551 seconds.
>> A = speye(10000);
>> tic; B = A*A; toc;
Elapsed time is 0.003411 seconds.
>> tic; B = 3*A; toc;
Elapsed time is 0.000564 seconds.