[SciPy-Dev] SVDLIBC for sparse SVDs

David Cournapeau cournape@gmail....
Wed Dec 12 13:31:45 CST 2012


On Mon, Dec 10, 2012 at 7:29 PM, Jake Vanderplas
<vanderplas@astro.washington.edu> wrote:
> Hi folks,
> I just came across a sparse svd implementation based on SVDLIBC [1] with
> a nice python wrapper utilizing Scipy's csc_matrix type [2]. Scipy
> currently includes a basic iterative sparse svd implementation based on
> ARPACK (scipy.sparse.linalg.svds), but the implementation is somewhat
> hackish.  The SVDLIBC version uses the same principles as ARPACK --
> Lanczos factorization -- and from my quick checks, can be faster than
> the ARPACK version in some cases.  All the code, including python
> wrappers, is released under a BSD license, so it would be fairly
> seamless to include in Scipy.
>
> On the plus side, incorporating SVDLIBC would add some well-tested
> sparse functionality and gives users more powerful options.  Where our
> current svds function performs iterations within python, the SVDLIBC
> implementation performs the iterations directly within the C code.  It
> uses the csc_matrix format internally, so no data copying is involved.
> It could fairly easily supplement or replace our current sparse svd.
>
> On the minus side, the functionality does duplicate what we already
> have, and would involve bundling another C package in Scipy.  This might
> cause some linking headaches (what if the user already has a different
> version of SVDLIBC on their system? We experienced this with ARPACK) and
> maintenance overhead (possibility of added compilation issues; the need
> to keep up with updates to SVDLIBC). Furthermore, sparsesvd is a fairly
> light-weight python package, and users needing the functionality could
> easily install it with pip if the need arises.
>
> I could be convinced either way, but I thought I'd ask the list: any
> thoughts on whether this would be worth including in Scipy?

I think that if we are changing the method for sparse SVD, we should
just use propack. Fabian noticed that they finally changed the license
from unspecified to BSD (this was not true last year).

Propack, while claiming higher accuracy (by avoiding computing A A'),
was one order of magnitude faster than Arpack for some matrices I have
tried.

cheers,
David


More information about the SciPy-Dev mailing list