[SciPy-User] scipy.sparse vs. pysparse

Tony Stillfjord tony@maths.lth...
Tue Sep 27 10:35:42 CDT 2011

```Hello everyone.

I'm a Ph.D. student in Numerical Analysis where I currently work with
exponential integrators for solving PDEs.

This involves large sparse matrices, and a main task is to perform Arnoldi's
algorithm to compute a Krylov

subspace. This involves just matrix-vector multiplications and scalar
products, so those are the operations I am

interested in.

For various reasons, I have been using pysparse (
http://pysparse.sourceforge.net/) for my sparse matrix needs

until recently when I decided to migrate my code to scipy.sparse because of
some complications I ran into when

wanting to also run my code on a Windows system.

Just to check that this would not ruin my computation times, I ran some
performance tests to see how efficient

scipy is in comparison to pysparse. In short, I set up as my test matrices
the standard central finite-difference

discretizations of the Laplacian in 1 and 2 dimensions and then timed the
matrix-vector product with random (but

fixed) vectors for different sizes of the matrices (i.e. discretization mesh
widths) and vectors. I usually end up

with matrices similar to these so the results should be representative of
the performance on my real problems.

In simplified code, I did

for N in Nlist:

A = laplacian_scipy(N)

B = laplacian_pysparse(N)

scipy.random.seed(seed)

b = rand(N)

t_scipy = 1e6*timeit.timeit('A*b', number = 1e3) / 1e3

t_pysparse = 1e6*timeit.timeit('B*b', number = 1e3) / 1e3

where the 1e6 numbers are to translate the times into microseconds and the
1e3's to specify that I want to run

each command a thousand times.

The results are reported below. What one can see is that for relatively
small matrices, pysparse is a lot faster.

This factor seems to decrease as the dimension of the matrix increases until
scipy actually becomes faster than

pysparse for really big matrices.

Originally I used the CSR format for scipy.sparse as well as pysparse since
pysparse only implements the CSR,

linked-list and sparse skyline formats. However, since my matrices in this
case are tri-diagonal and five-diagonal

(though not with bandwidth 5), I figured maybe I should use the DIA format
to be fair to scipy. Those results are

also shown below, and the timings got somewhat, but not greatly, better.
Regardless, in my real problems I will

not have purely n-diagonal matrices, so the CSR-data is probably of most
interest. Note that the pysparse format

is still CSR.

Is there a lot of overhead in the scipy.sparse function calls, or is there
some other reason that scipy performs

so much worse for "small" matrices? I realise that if you have a small
enough matrix you could just as well have

it be dense, but the point where scipy starts to break even with pysparse is
quite far from the size where I would

switch to the sparse format.

Kind regards,

Tony Stillfjord

The timing data, for two different systems (I hope the formatting is not
completely destroyed upon mailing this,

sorry if that is the case):

Ubuntu 10.04

numpy v.1.6.1

scipy v.0.9.0

pysparse development version

========= CSR ==========

1D:

N SciPy pysparse

50 5.82e+01 6.17e+00

100 5.48e+01 6.63e+00

200 5.62e+01 7.58e+00

300 8.93e+01 9.15e+00

500 5.96e+01 1.03e+01

1000 6.59e+01 1.42e+01

2500 7.27e+01 2.44e+01

5000 8.64e+01 4.29e+01

10000 1.17e+02 8.12e+01

25000 2.13e+02 2.06e+02

50000 5.02e+02 6.10e+02

100000 1.25e+03 1.44e+03

2D:

N SciPy pysparse

100 5.54e+01 7.18e+00

625 6.17e+01 1.45e+01

2500 7.79e+01 3.85e+01

10000 1.44e+02 1.36e+02

40000 5.73e+02 1.05e+03

90000 1.51e+03 2.51e+03

250000 4.35e+03 7.57e+03

1000000 1.61e+04 3.33e+04

========= DIA ==========

1D:

N SciPy pysparse

50 6.51e+01 6.30e+00

100 5.53e+01 6.58e+00

200 5.70e+01 7.68e+00

300 5.74e+01 8.58e+00

500 5.86e+01 1.00e+01

1000 6.18e+01 1.37e+01

2500 6.83e+01 2.68e+01

5000 7.97e+01 4.36e+01

10000 1.04e+02 8.10e+01

25000 1.75e+02 1.91e+02

50000 3.45e+02 5.23e+02

100000 8.18e+02 1.39e+03

2D:

N SciPy pysparse

100 5.70e+01 7.15e+00

625 6.21e+01 1.44e+01

2500 7.81e+01 3.91e+01

10000 1.33e+02 1.36e+02

40000 3.79e+02 9.78e+02

90000 1.01e+03 2.55e+03

250000 6.88e+03 7.80e+03

1000000 2.74e+04 3.14e+04

Windows 7

numpy v.1.6.1

scipy v.0.10.0b2

pysparse v.1.1.1

-All compiled against Intel MKL

========= CSR ==========

1D:

N SciPy pysparse

50 3.69e+01 3.72e+00

100 3.73e+01 3.89e+00

200 3.82e+01 4.32e+00

300 3.89e+01 5.22e+00

500 3.99e+01 5.73e+00

1000 4.23e+01 7.60e+00

2500 5.10e+01 1.39e+01

5000 6.35e+01 2.44e+01

10000 8.60e+01 4.38e+01

25000 1.56e+02 1.01e+02

50000 2.71e+02 1.94e+02

100000 5.35e+02 4.22e+02

2D:

N SciPy pysparse

100 3.76e+01 4.11e+00

625 4.28e+01 8.57e+00

2500 5.62e+01 2.24e+01

10000 1.05e+02 7.34e+01

40000 2.98e+02 2.83e+02

90000 7.30e+02 8.41e+02

250000 2.66e+03 2.99e+03

1000000 1.01e+04 1.21e+04

========= DIA ==========

1D:

N SciPy pysparse

50 4.01e+01 3.77e+00

100 4.00e+01 3.93e+00

200 4.02e+01 4.33e+00

300 4.12e+01 5.03e+00

500 4.19e+01 5.81e+00

1000 4.33e+01 7.44e+00

2500 4.90e+01 1.43e+01

5000 5.66e+01 2.48e+01

10000 7.09e+01 4.49e+01

25000 1.20e+02 1.06e+02

50000 1.97e+02 1.98e+02

100000 3.87e+02 4.31e+02

2D:

N SciPy pysparse

100 4.05e+01 4.11e+00

625 4.34e+01 8.45e+00

2500 5.30e+01 2.24e+01

10000 8.58e+01 7.50e+01

40000 2.36e+02 2.82e+02

90000 5.14e+02 8.80e+02

250000 2.28e+03 3.01e+03

1000000 1.13e+04 1.21e+04
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110927/6aa5dcb8/attachment.html
```