[SciPy-User] scipy.sparse vs. pysparse
Tony Stillfjord
tony@maths.lth...
Tue Sep 27 10:35:42 CDT 2011
Hello everyone.
I'm a Ph.D. student in Numerical Analysis where I currently work with
exponential integrators for solving PDEs.
This involves large sparse matrices, and a main task is to perform Arnoldi's
algorithm to compute a Krylov
subspace. This involves just matrix-vector multiplications and scalar
products, so those are the operations I am
interested in.
For various reasons, I have been using pysparse (
http://pysparse.sourceforge.net/) for my sparse matrix needs
until recently when I decided to migrate my code to scipy.sparse because of
some complications I ran into when
wanting to also run my code on a Windows system.
Just to check that this would not ruin my computation times, I ran some
performance tests to see how efficient
scipy is in comparison to pysparse. In short, I set up as my test matrices
the standard central finite-difference
discretizations of the Laplacian in 1 and 2 dimensions and then timed the
matrix-vector product with random (but
fixed) vectors for different sizes of the matrices (i.e. discretization mesh
widths) and vectors. I usually end up
with matrices similar to these so the results should be representative of
the performance on my real problems.
In simplified code, I did
for N in Nlist:
A = laplacian_scipy(N)
B = laplacian_pysparse(N)
scipy.random.seed(seed)
b = rand(N)
t_scipy = 1e6*timeit.timeit('A*b', number = 1e3) / 1e3
t_pysparse = 1e6*timeit.timeit('B*b', number = 1e3) / 1e3
where the 1e6 numbers are to translate the times into microseconds and the
1e3's to specify that I want to run
each command a thousand times.
The results are reported below. What one can see is that for relatively
small matrices, pysparse is a lot faster.
This factor seems to decrease as the dimension of the matrix increases until
scipy actually becomes faster than
pysparse for really big matrices.
Originally I used the CSR format for scipy.sparse as well as pysparse since
pysparse only implements the CSR,
linked-list and sparse skyline formats. However, since my matrices in this
case are tri-diagonal and five-diagonal
(though not with bandwidth 5), I figured maybe I should use the DIA format
to be fair to scipy. Those results are
also shown below, and the timings got somewhat, but not greatly, better.
Regardless, in my real problems I will
not have purely n-diagonal matrices, so the CSR-data is probably of most
interest. Note that the pysparse format
is still CSR.
Is there a lot of overhead in the scipy.sparse function calls, or is there
some other reason that scipy performs
so much worse for "small" matrices? I realise that if you have a small
enough matrix you could just as well have
it be dense, but the point where scipy starts to break even with pysparse is
quite far from the size where I would
switch to the sparse format.
Kind regards,
Tony Stillfjord
The timing data, for two different systems (I hope the formatting is not
completely destroyed upon mailing this,
sorry if that is the case):
Ubuntu 10.04
numpy v.1.6.1
scipy v.0.9.0
pysparse development version
========= CSR ==========
1D:
N SciPy pysparse
50 5.82e+01 6.17e+00
100 5.48e+01 6.63e+00
200 5.62e+01 7.58e+00
300 8.93e+01 9.15e+00
500 5.96e+01 1.03e+01
1000 6.59e+01 1.42e+01
2500 7.27e+01 2.44e+01
5000 8.64e+01 4.29e+01
10000 1.17e+02 8.12e+01
25000 2.13e+02 2.06e+02
50000 5.02e+02 6.10e+02
100000 1.25e+03 1.44e+03
2D:
N SciPy pysparse
100 5.54e+01 7.18e+00
625 6.17e+01 1.45e+01
2500 7.79e+01 3.85e+01
10000 1.44e+02 1.36e+02
40000 5.73e+02 1.05e+03
90000 1.51e+03 2.51e+03
250000 4.35e+03 7.57e+03
1000000 1.61e+04 3.33e+04
========= DIA ==========
1D:
N SciPy pysparse
50 6.51e+01 6.30e+00
100 5.53e+01 6.58e+00
200 5.70e+01 7.68e+00
300 5.74e+01 8.58e+00
500 5.86e+01 1.00e+01
1000 6.18e+01 1.37e+01
2500 6.83e+01 2.68e+01
5000 7.97e+01 4.36e+01
10000 1.04e+02 8.10e+01
25000 1.75e+02 1.91e+02
50000 3.45e+02 5.23e+02
100000 8.18e+02 1.39e+03
2D:
N SciPy pysparse
100 5.70e+01 7.15e+00
625 6.21e+01 1.44e+01
2500 7.81e+01 3.91e+01
10000 1.33e+02 1.36e+02
40000 3.79e+02 9.78e+02
90000 1.01e+03 2.55e+03
250000 6.88e+03 7.80e+03
1000000 2.74e+04 3.14e+04
Windows 7
numpy v.1.6.1
scipy v.0.10.0b2
pysparse v.1.1.1
-All compiled against Intel MKL
========= CSR ==========
1D:
N SciPy pysparse
50 3.69e+01 3.72e+00
100 3.73e+01 3.89e+00
200 3.82e+01 4.32e+00
300 3.89e+01 5.22e+00
500 3.99e+01 5.73e+00
1000 4.23e+01 7.60e+00
2500 5.10e+01 1.39e+01
5000 6.35e+01 2.44e+01
10000 8.60e+01 4.38e+01
25000 1.56e+02 1.01e+02
50000 2.71e+02 1.94e+02
100000 5.35e+02 4.22e+02
2D:
N SciPy pysparse
100 3.76e+01 4.11e+00
625 4.28e+01 8.57e+00
2500 5.62e+01 2.24e+01
10000 1.05e+02 7.34e+01
40000 2.98e+02 2.83e+02
90000 7.30e+02 8.41e+02
250000 2.66e+03 2.99e+03
1000000 1.01e+04 1.21e+04
========= DIA ==========
1D:
N SciPy pysparse
50 4.01e+01 3.77e+00
100 4.00e+01 3.93e+00
200 4.02e+01 4.33e+00
300 4.12e+01 5.03e+00
500 4.19e+01 5.81e+00
1000 4.33e+01 7.44e+00
2500 4.90e+01 1.43e+01
5000 5.66e+01 2.48e+01
10000 7.09e+01 4.49e+01
25000 1.20e+02 1.06e+02
50000 1.97e+02 1.98e+02
100000 3.87e+02 4.31e+02
2D:
N SciPy pysparse
100 4.05e+01 4.11e+00
625 4.34e+01 8.45e+00
2500 5.30e+01 2.24e+01
10000 8.58e+01 7.50e+01
40000 2.36e+02 2.82e+02
90000 5.14e+02 8.80e+02
250000 2.28e+03 3.01e+03
1000000 1.13e+04 1.21e+04
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110927/6aa5dcb8/attachment.html
More information about the SciPy-User
mailing list