[SciPy-dev] sparse comments and questions
Nathan Bell
wnbell@gmail....
Sat Jul 7 20:49:43 CDT 2007
Currently the sparse matrix storage formats (e.g. sparse.csr_matrix )
only support integer indices ( numpy.intc ) and floating point data
(float32, float64, complex64, and complex128). While this addresses
the common usage, it would be nice to support other index and data
formats also. For example, when using a csr_matrix to represent a
graph the dtype is mostly irrelevant, so smaller types like uint8
would be desirable. Likewise, on a 64-bit machine, one might like to
use 32-bit indices (instead of the default C-int size) to economize on
space.
The sparsetools routines are fully templated, so it's a simple matter
to support the other types there. However, I don't know whether other
subpackages that use sparse matrices would require changes. Any
thoughts?
I have a few other questions and comments regarding the sparse package.
1) Does anyone depend on the .ftype attribute of the sparse formats?
Can it be eliminated? If not, can it be trapped by __getattr__() so
that it doesn't become unsynchronized from data.dtype?
2) When operating on multiple types, what's the best way to get the
correct upcast type? For example int32 + float32 -> float64. I know
this can be done experimentally (
http://www.scipy.org/Tentative_NumPy_Tutorial/Discussion ). Is there
a better way?
3) Should we allow modifications to CSR and CSC matrices that force
O(N) updates/reallocations. For instance, adding a new value into a
CSR matrix is essentially as expensive as building a new one from
scratch. Could we instead prohibit such modifications and raise an
exception informing the user that lil_matrix is the proper format for
such operations? Note that changing an existing matrix value is not
so costly (typically O(1)), so not all calls to __setitem__ would be
banned.
4) lil_matrix now supports more sophisticated slicing, but the
implementation is pretty hairy and not especially fast. Is anyone
looking at this? Does the slicing work correctly? I fixed a bug
related to negative indices, but I'm not completely confident in the
robustness of my fix.
5) Scipy's spdiags() works differently than the MATLAB version.
Should it be changed to coincide? If we later support a sparse
diagonal format, should the output of spdiags change to dia_matrix
instead of the present csc_matrix?
6) When working with sparse matrices, do people expect the result to
be of the same type? For example, it's far more efficient to make
transpose(csr_matrix) output a csc_matrix (as is currently done in
SciPy) since that's a O(1) operation (as opposed to an O(n) cost
otherwise). So in general, should a method to return the result in
most efficient output format, relying on the user to convert it to
another format if so desired? I.e. if you need a csr_matrix at a
given point, then you should call .tocsr() to be sure. If it's
already a csr_matix, then no conversion will be done.
7) Where do functions that operate on sparse matrices belong? For
example, suppose I wrote a function to extract the lower diagonal
entries of a matrix. Should it be added to sparse.py (like spdiags)
or somewhere else? I'd like to add some functionality to allow the
extraction of submatrices from CSR/CSC matrices. This could be used
when removing the DOFs corresponding to boundary conditions from a
bilinear form. Does anyone have a suggestion for an interface for
this functionality? It's possible to implement this via sparse matrix
multiplication, albeit with some additional overhead.
--
Nathan Bell wnbell@gmail.com
More information about the Scipy-dev
mailing list