[SciPy-User] Efficient iteration over a scipy.sparse.csc_matrix?

Nathaniel Smith njs@pobox....
Fri Aug 13 21:28:17 CDT 2010


On Fri, Aug 13, 2010 at 6:17 PM, Jacob Biesinger
<jake.biesinger@gmail.com> wrote:
> Hi!
> I have some large sparse matrices (csc_matrix since they are 1e10 rows x 4
> columns).  I expect that some (many) of the rows are completely empty so is
> there an efficient way to iterate over *only* the rows that have entries?
>  Perhaps something along the lines of
> useRows=set(ma.indices) ?

Not sure what kind of API you're thinking of there, but the CSC format
is not *too* hard to work with directly, once you've wrapped your head
around it. See any description on the web and the matrix attributes
ma.data, ma.indices, ma.indptr.

(In particular, note that ma.data is the non-zero values in your
matrix, ma.indices is the same length as ma.data and gives the row
index where each corresponding data point resides, and ma.indptr
encodes column information in a slightly more complicated way. So you
can probably slice ma.data/ma.indices and then iterate over them.)

-- Nathaniel


More information about the SciPy-User mailing list