[SciPy-User] fancy indexation of sparse matrices is terrible slow
Dag Sverre Seljebotn
dagss@student.matnat.uio...
Tue Dec 15 05:15:06 CST 2009
Dmitrey wrote:
> От кого: Nathan Bell <wnbell@gmail.com>
>
> 2009/12/14 Dmitrey <tmp50@ukr.net>:
> >
> > I have tried all 3, lil works faster than any others, but
> casting to numpy
> > ndarray and invoking it on the dense array obtained works
> several times
> > faster, that is extremely important for me. Thus on my problem
> with sparse
> > matrix of shape 151* 1178
> > each time casting it to dense before applying fancy indexing
> makes numerical
> > optimization ipopt solver works faster from 130 to 60 sec.
> Here's the code
> > that demonstrates the difference:
> >
> >
> > So, I have filed a ticket for it.
>
> Hi Dmitrey,
>
> I think what you're asking for is simply not possible. The time spent
> looking up elements in a dense matrix (a completely trivial
> operation)
> will always be less than the lookup in sparse matrix data structures
> where indexing elements is inherently more expensive.
>
>
> but what about dok format? I guess it shouldn't be too expensive - it
> is O(numberOfRequiredElements*log2(nonZerosNumber)), and there will
> be no peak memory jump as for lil.
>
> If you want something that does the above operation really fast
> then you can do
> >>> M = coo_matrix(M)
> >>> M.data # same as M[I,J]
> or
>
>
> But I want to have M[I, J], not just M. Some (or lots of) required
> elements can be zeros and thus absent in M.data.
>
> Why do you want to index sparse matrices in the first place? For many
> operations there are better formulations that avoid the need to do
> indexing of any kind.
>
>
> Because some optimization problems can have constraints gradient of
> size nVars*nConstraints very huge. nVars are up to ~100'000, and
> nConstraints are up to 200'000'000. I will just have memory overflow
> if I will try casting it to dense each time.
> BTW, why return type of numpy.where is int64, but type of
> scipy.sparse.find is numpy.int32? I had spent lots of time to search
> why ipopt hadn't work with that data, and it turned out it is due to
> this type difference (maybe some issues related to connecting Python
> code to C++). Also, does it mean that I cannot create sparse matrices
> with one dimension greater than max(int32) instead of max(int64)?
I can confirm this, scipy.sparse uses int32 indices only. Search for
"intc" in the sources for the relevant lines.
Dag Sverre
More information about the SciPy-User
mailing list