[SciPy-user] New sparse matrix functionality

Ed Schofield schofield at ftw.at
Mon Feb 27 03:21:26 CST 2006

```Hi all,

I'd created a new SVN branch with some new functionality and changes to
sparse matrices.  This post describes what's new and what's different
versus the main branch.  If you have an existing SVN tree, you can pull
in my branch with the command "svn switch
http://svn.scipy.org/svn/scipy/branches/ejs".

One of the conclusions of Jonathan Guyer's comparison between
scipy.sparse and PySparse in November
(http://article.gmane.org/gmane.comp.python.scientific.devel/3187/match=scipy+pysparse+wiki
and http://old.scipy.org/wikis/featurerequests/SparseSolvers) was that
SciPy's support for efficient construction of sparse matrices is weak.
My patch adds a new data type, lil_matrix, that stores non-zeros as a
list of sorted Python lists.  For the simple benchmark of creating a new
10^3 x 10^5 matrix with 10^4 non-zero elements in random locations and
converting to CSR and CSC matrices, the lil_matrix format is slightly
more than twice as fast as dok_matrix.  It's a row-wise format, so
conversion to CSR is very fast, whereas conversion to CSC goes through
CSR internally.  Index lookups use binary searches, so they take log
time.  I think the implementation is already complete enough for most
uses -- so please try it out and tell me how you go!

Another of Jonathan's observations was that SciPy had no support for
slicing matrices with A[i, :] like in PySparse.  My patch adds support
for slice notation and NumPy-style fancy indexing to dok_matrix and
lil_matrix objects.  With this it's possible to build sparse matrices
quickly -- for example:

>>> a = lil_matrix((3,5))
>>> a[1,[2,3,4]] = range(3)
>>> a[2,:] = 3 * a[1,:]

A third new feature is a .sum() method, which takes a single axis
argument like in NumPy.

My branch also changes one aspect of the existing behaviour: the
todense() method now returns a dense (NumPy) matrix, rather than a dense
array.  Converting to a dense array is now available under a toarray()
method (or .A attribute) instead.  The rationale behind this change is
to emphasize that sparse matrices are closer to dense matrices than to
dense arrays in their attributes (e.g. .T for transpose) and behaviour
(e.g. * means inner product).  I've also been careful to make
multiplication between sparse matrices and dense row or column vectors
(matrix objects with unit length or height) return matrices of the
correct dimensions, rather than arrays.  Several unit tests relied on
the old behaviour, and I've changed these accordingly.  Most of these
test changes are just simplifications -- for example

assert_array_equal((a*c).todense(), a.todense() * c)

assert_array_equal((a*c).todense(), dot(a.todense(), c))

-- but I'd appreciate some criticism and feedback on which behaviour
people prefer.

These changes have highlighted a problem present in both the main trunk
and my branch: that multiplying a dense matrix 'a' by a sparse matrix
'b' is not possible using the syntax 'a*b'.  I'll follow this up with a
proposal to numpy-discussion on how we can solve this.

-- Ed

```