[SciPy-Dev] GSOC - improvements to the .sparse package advice

Pauli Virtanen pav@iki...
Wed Apr 10 03:10:59 CDT 2013

Izzy Cecil <lorr.cecil <at> gmail.com> writes:
> I was wondering who would be appropriate to discus this project with,
> and if there were specific things I could do now to familiarize myself
> with the codebase. Any smaller bugs that could be fixed, or features
> that could be added to the library? Or should I consider a different
> project all together (perhaps Pythonic dtypes)? In the meantime, I'll
> hunt around trac, and mess with what I can, but any and all advice
> would be much appreciated!

There's sort of a TODO list here, specified in terms of unit tests:


You notice that the CSR/CSC matrixes for instance do not have a full
support for indexing, indexing for the DOK matrix type doesn't function
as intended. So, fixing these is at least one project. Fixing these was
begun here https://github.com/scipy/scipy/pull/425 which implemented
a correct indexing mechanism for LIL matrices. However, this mechanism
is not optimized for a specific matrix type (can be used as a fallback).
It would be important here to not lose too much speed --- CSR and CSC
have optimized fast paths for some cases.

As a (part of a) GSoC project, the objective is reasonably well defined
as the test suite exists.

One additional thing could be to try to improve the speed of
scipy.sparse, profiling common use cases and trying to optimize
the most commonly encountered code paths.

Then there are some additional possible things to do like integration
of Expokit (matrix exponentials via Krylov methods). The dense matrix
part of Expokit was done here and is almost ready:
https://github.com/scipy/scipy/pull/354, the sparse (or, matrix-free) 
part hasn't been started but could be nice to have at the same time.

Another possible question is better integration with Numpy in terms of
making binary ops like np.multiply(A, B) and np.dot(A, B) overridable
so that they can do the right thing for sparse matrices. This is however
not so trivial as it requires some additions to Numpy.

Pauli Virtanen

More information about the SciPy-Dev mailing list