[SciPy-User] Sparse vector

Felix Schlesinger schlesin@cshl....
Tue Apr 27 23:36:48 CDT 2010


Francesc Alted <faltet <at> pytables.org> writes: 
> A Wednesday 21 April 2010 23:01:14 Felix Schlesinger escrigué:
> > Storing indicies seperate is a good option sometimes but does not allow
> >  random access. Are you aware of any fast hash-table implementation that
> >  can store atomic values (e.g. numpy floats) instead of general python
> >  objects? That would go a long way towards sparse vectors.
> 
> What I'd suggest is not exactly a hash-table implementation, but rather using 
> compression for keeping you sparse vectors (or arrays).  Then use a package 
> that can deal with typical numpy math operations and slicing but with 
> compressed containers instead of ndarrays, and you will have something very 
> close to what you are looking for.

I use pytables a lot (thanks for the great work) and compression can help with
some sparse data tasks, but in my case (and probably many others) random access
is important. And maybe performing some operation only on those entries that are
not 0.

> In [21]: timeit -r1 -n1 v0[np.random.randint(0, N, 10000)]
> 1 loops, best of 1: 165 ms per loop
> In [24]: timeit -r1 -n1 a0[np.random.randint(0, N, 10000)]
> 1 loops, best of 1: 1.39 ms per loop>

This difference is pretty dramatic and the scipy.sparse matrices (compressed row
or dictionary of keys) are much closer to numpy performance. In some use cases
this matters, for others the very good performance on expression evaluation with
large on-disk arrays of pytables works fine.

I think there is room for a general sparse vector class in numpy (maybe based on
something in scipy.sparse). But there are many trade-offs in the design.




More information about the SciPy-User mailing list