[SciPy-Dev] SVDLIBC for sparse SVDs
Jake Vanderplas
vanderplas@astro.washington....
Mon Dec 10 12:29:43 CST 2012
Hi folks,
I just came across a sparse svd implementation based on SVDLIBC [1] with
a nice python wrapper utilizing Scipy's csc_matrix type [2]. Scipy
currently includes a basic iterative sparse svd implementation based on
ARPACK (scipy.sparse.linalg.svds), but the implementation is somewhat
hackish. The SVDLIBC version uses the same principles as ARPACK --
Lanczos factorization -- and from my quick checks, can be faster than
the ARPACK version in some cases. All the code, including python
wrappers, is released under a BSD license, so it would be fairly
seamless to include in Scipy.
On the plus side, incorporating SVDLIBC would add some well-tested
sparse functionality and gives users more powerful options. Where our
current svds function performs iterations within python, the SVDLIBC
implementation performs the iterations directly within the C code. It
uses the csc_matrix format internally, so no data copying is involved.
It could fairly easily supplement or replace our current sparse svd.
On the minus side, the functionality does duplicate what we already
have, and would involve bundling another C package in Scipy. This might
cause some linking headaches (what if the user already has a different
version of SVDLIBC on their system? We experienced this with ARPACK) and
maintenance overhead (possibility of added compilation issues; the need
to keep up with updates to SVDLIBC). Furthermore, sparsesvd is a fairly
light-weight python package, and users needing the functionality could
easily install it with pip if the need arises.
I could be convinced either way, but I thought I'd ask the list: any
thoughts on whether this would be worth including in Scipy?
Jake
[1] http://tedlab.mit.edu/~dr/SVDLIBC/
[2] http://pypi.python.org/pypi/sparsesvd/
More information about the SciPy-Dev
mailing list