[SciPy-Dev] sparse vectors / matrices / tensors
Tue Sep 20 11:06:03 CDT 2011
I have been working quite a lot with sparse vectors and sparse matrices
as feature vectors in the context of machine learning), and have noticed
do crop up in a lot of places (e.g. the CVXOPT library, in scikits, ...) and
tend to either reinvent the wheel (i.e. implement a complete sparse matrix
pretend that no separate data structure is needed (i.e. always passing along
coordinate and data arrays).
The most obvious response is to point to scipy.sparse, however I ended up
reimplementing a sparse matrix library myself because
- scipy.sparse is limited to matrices and has no vectors or order-k tensors
- LIL and DOK are not really efficient or convenient data structures to
sparse matrices (my own library basically keeps a list of unordered COO
and compacts/sorts them when the matrix is actually used as a matrix)
As a result, I built yet another sparse matrix library, and I was wondering
(i) there's some generic enough data structure that could be a sparse
to numpy's ndarray (i.e., good enough for 99% of the people, 99% of the time
current guess would be that the mutable COO tensor implementation I
or something vaguely similar, might actually fit the bill), or
(ii) whether it would make sense to have some conventions for standardized
to other people's sparse matrix packages, either by defining a minimum set
methods that would be useful or by defining some kind of low-level interface
to Python's buffer interface).
The answers to (i) and (ii) do depend on what people do with sparse
matrices, and I'd
expect people who deal with PDEs to have different needs than people who use
matrices for co-occurrence graph, or as feature matrix in machine learning,
etc. - so
I'd like to hear from people who have different use cases than I do.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-Dev