[SciPy-User] fancy indexation of sparse matrices is terrible slow

Dag Sverre Seljebotn dagss@student.matnat.uio...
Tue Dec 15 05:15:06 CST 2009

Dmitrey wrote:
> От кого: Nathan Bell <wnbell@gmail.com>
>     2009/12/14 Dmitrey <tmp50@ukr.net>:
>     >
>     > I have tried all 3, lil works faster than any others, but
>     casting to numpy
>     > ndarray and invoking it on the dense array obtained works
>     several times
>     > faster, that is extremely important for me. Thus on my problem
>     with sparse
>     > matrix of shape 151* 1178
>     > each time casting it to dense before applying fancy indexing
>     makes numerical
>     > optimization ipopt solver works faster from 130 to 60 sec.
>     Here's the code
>     > that demonstrates the difference:
>     >
>     >
>     > So, I have filed a ticket for it.
>     Hi Dmitrey,
>     I think what you're asking for is simply not possible. The time spent
>     looking up elements in a dense matrix (a completely trivial
>     operation)
>     will always be less than the lookup in sparse matrix data structures
>     where indexing elements is inherently more expensive.
> but what about dok format? I guess it shouldn't be too expensive - it 
> is  O(numberOfRequiredElements*log2(nonZerosNumber)), and there will 
> be no peak memory jump as for lil.
>     If you want something that does the above operation really fast
>     then you can do
>     >>> M = coo_matrix(M)
>     >>> M.data # same as M[I,J]
>     or
> But I want to have M[I, J], not just M. Some (or lots of) required 
> elements can be zeros and thus absent in  M.data.
>     Why do you want to index sparse matrices in the first place? For many
>     operations there are better formulations that avoid the need to do
>     indexing of any kind.
> Because some optimization problems can have constraints gradient of 
> size nVars*nConstraints very huge.  nVars are up to ~100'000, and 
> nConstraints are up to 200'000'000. I will just have memory overflow 
> if I will try casting it to dense each time.
> BTW, why return type of numpy.where is int64, but type of 
> scipy.sparse.find is numpy.int32? I had spent lots of time to search 
> why ipopt hadn't work with that data, and it turned out it is due to 
> this type difference (maybe some issues related to connecting Python 
> code to C++). Also, does it mean that I cannot create sparse matrices 
> with one dimension greater than max(int32)  instead of max(int64)?
I can confirm this, scipy.sparse uses int32 indices only. Search for 
"intc" in the sources for the relevant lines.

Dag Sverre

More information about the SciPy-User mailing list