[SciPy-user] scipy.sparse: coo_matrix ignores sum_duplicates=False

James Philbin philbinj@gmail....
Mon Oct 13 17:33:45 CDT 2008


> By "ignored" do you mean that you want only the first or last value to be used?
My use case is perhaps a bit non-standard. I'm approximately computing
a large pairwise similarity matrix distributed across multiple
processes. The algorithm will sometimes output the same pairwise
distance more than once, so all the subsequent values will be the
same. I think dok_matrix is fine for my needs. BTW, i've found that
__setitem__ is v slow for dok_matrix. Is this just because of the
checks which are made? Using dict.__setitem__(mat, (r,c), val) is
about an order of magnitude faster.

> Summing duplicates when converting COO->CSR is fairly common (e.g.
> UMFPACK does it) and quite useful if you're assembling FEM matrices.
> Furthermore, regarding duplicate entries as parts of a sum is
> necessary if one wants to maintain consistency with matrix-vector
> multiplication (i.e A*x == A.tocsr() * x).  In theory you could change
> this as well, but it would be *very* costly.
I'm not arguing that summing duplicate entries is not desirable. I'm
just arguing that a function which reads .tocsr(sum_duplicates=False)
and then sums the duplicates implicitly is misnamed.

> FYI, others have expressed an interest more general accumulation methods:
> http://thread.gmane.org/gmane.comp.python.scientific.devel/7667
This is never something i've needed, but I agree it could be useful.

Thanks,
James


More information about the SciPy-user mailing list