[SciPy-user] scipy.sparse: coo_matrix ignores sum_duplicates=False
Nathan Bell
wnbell@gmail....
Mon Oct 13 21:57:15 CDT 2008
On Mon, Oct 13, 2008 at 6:33 PM, James Philbin <philbinj@gmail.com> wrote:
>
> same. I think dok_matrix is fine for my needs. BTW, i've found that
> __setitem__ is v slow for dok_matrix. Is this just because of the
> checks which are made? Using dict.__setitem__(mat, (r,c), val) is
> about an order of magnitude faster.
I don't use dok_matrix, so I don't know why it would be that much
slower. If you can speed it up and submit a patch I'd happily apply
it.
> I'm not arguing that summing duplicate entries is not desirable. I'm
> just arguing that a function which reads .tocsr(sum_duplicates=False)
> and then sums the duplicates implicitly is misnamed.
Please understand, it *does not* sum the duplicates. As I illustrated
before, the duplicates are carried over to the CSR format. It's just
that CSR->dense *does* sum duplicates.
I agree that sum_duplicates=False is somewhat ambiguous, do you have a
suggestion for how this could be made more clear? For instance, would
an interface like:
coo_matrix.tocsr(duplicates='sum')
coo_matrix.tocsr(duplicates='last')
coo_matrix.tocsr(duplicates='max')
be preferred? If I understand correctly, you'd want to use
.tocsr(duplicates='last').
Another question is whether we want to put this in the COO->CSR (and
CSC) conversions. At this point, I think COO->CSR should *always* sum
duplicates together and we should instead provide a separate function
or member function of coo_matrix that provides additional options,
like 'last', 'max', etc. In general, any binary operator (T,T) -> T
could be used as an accumulator, but we would provide the most common
options.
--
Nathan Bell wnbell@gmail.com
http://graphics.cs.uiuc.edu/~wnbell/
More information about the SciPy-user
mailing list