[SciPy-Dev] SVDLIBC for sparse SVDs
Wed Dec 12 13:15:42 CST 2012
Here is a quick notebook benchmarking the speeds of ARPACK vs SVDLIBC:
Quick summary: SVDLIBC seems to be faster for matrices smaller than N of
a few hundred, while ARPACK is faster for larger matrices. I think this
reflects the python overhead in ARPACK's iteration interface, which
becomes negligible as the cost of an iteration grows.
I haven't compared the stability or accuracy of the two algorithms.
However, both use Lanczos diagonalization under the hood, so I'd expect
them to be similar in this regard.
Unless there's another compelling reason to port SVDLIBC to scipy, it
looks like the ARPACK svd is generally a sufficient option. It's more
flexible, faster where it matters, and (perhaps most importantly)
already in the library :)
On 12/12/2012 04:50 AM, Fabian Pedregosa wrote:
> On Mon, Dec 10, 2012 at 7:29 PM, Jake Vanderplas
> <mailto:firstname.lastname@example.org>> wrote:
> Hi folks,
> I just came across a sparse svd implementation based on SVDLIBC
>  with
> a nice python wrapper utilizing Scipy's csc_matrix type . Scipy
> currently includes a basic iterative sparse svd implementation
> based on
> ARPACK (scipy.sparse.linalg.svds), but the implementation is somewhat
> hackish. The SVDLIBC version uses the same principles as ARPACK --
> Lanczos factorization -- and from my quick checks, can be faster than
> the ARPACK version in some cases. All the code, including python
> wrappers, is released under a BSD license, so it would be fairly
> seamless to include in Scipy.
> On the plus side, incorporating SVDLIBC would add some well-tested
> sparse functionality and gives users more powerful options. Where our
> current svds function performs iterations within python, the SVDLIBC
> implementation performs the iterations directly within the C code. It
> uses the csc_matrix format internally, so no data copying is involved.
> It could fairly easily supplement or replace our current sparse svd.
> I used this routine for the passed weeks. I frankly saw no
> improvements in performance over the current ARPACK implementation,
> plus I found it annoying to having to explicitly convert to CSC. Also,
> the current bindings do not provide any optional parameters such as
> tolerance or maxiter.
> Part of my applications is large dense matrices, and in that case
> converting to CSC kills performance, loosing a factor 2-3 over ARPACK.
> But I'd be interested to see if it has practical advantages
> (stability? accuracy?) over ARPACK.
> SciPy-Dev mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-Dev