[SciPy-dev] slow setitem

Nathan Bell wnbell@gmail....
Tue Dec 1 13:32:27 CST 2009


2009/11/30 Benny Malengier <benny.malengier@gmail.com>:
> A question about the sparse matrix and suggestion for speed improvement.
>
> I have a code of 1000*1000 sparse matrix of 11 full diagonals. Using csr.
> Assigning the matrix via setdiags takes 14 seconds, of which 13
> seconds is in the check_format function of sparse/base.py.
>
> This is due to setdiag doing:
>
>     self[i, i + k] = v
>
> while setitem in compressed.py always does a
>
>     self.check_format(full_check=True)
>
> Removing the check_format reduces assignment of the matrix to 1 second.
> Is it really needed that the assignmentof setdiag is again checked via
> check_format?
> Allowing for a setitem that does not call check_format would be usefull.
>
> One could set a dirty flag on a setitem call, and do a check_format
> only when a computation is performed and the flag is dirty. On the
> other hand, I'm not making errors, and to have a flag like
> NOCHECKS=TRUE would be handy as it would speed up the code a lot.
> I could create a solution that is acceptable for scipy, but then I'd
> need to know how such a solution should look like.
>
> Note that there are quite some zeroes in my diagonals now too, that
> are now set to 0., if setitem would be fast, I could drop setdiag
> call, and do a fast setitem call. Or are sparse matrixes supposed to
> be used differently?

Hi Pavol,

Try creating your matrix with the spdiags() function or the related
dia_matrix class.  Constructing a matrix from diagonals with one of
these methods will be at least 100x faster than inserting into a
csr_matrix.

We don't optimize the csr_matrix element insertion code because it's
inherently slow, checks or no checks.  Every time you introduce a new
nonzero the csr_matrix must perform an O(N) update to the underlying
data structure (hence, inserting N elements is O(N^2)).

-- 
Nathan Bell wnbell@gmail.com
http://www.wnbell.com/


More information about the SciPy-Dev mailing list