[SciPy-user] scipy.sparse: coo_matrix ignores sum_duplicates=False

James Philbin philbinj@gmail....
Tue Oct 14 03:56:42 CDT 2008


> Please understand, it *does not* sum the duplicates.  As I illustrated
> before, the duplicates are carried over to the CSR format.  It's just
> that CSR->dense *does* sum duplicates.
No, I do understand. This is why I said 'implicitly'. The CSR keeps
the duplicates, but then always behaves as if they'd been summed. To
the user, therefore, the effect is the same.

> I agree that sum_duplicates=False is somewhat ambiguous, do you have a
> suggestion for how this could be made more clear?  For instance, would
> an interface like:
>  coo_matrix.tocsr(duplicates='sum')
>  coo_matrix.tocsr(duplicates='last')
>  coo_matrix.tocsr(duplicates='max')
> be preferred?  If I understand correctly, you'd want to use
> .tocsr(duplicates='last').
I'm not sure it's worth you having to implement something which i'm
not sure that many people really need. I don't want scipy.sparse to
get feature-itis. I'd be happy if the sum_duplicates parameter was
removed altogether, with the standard behaviour being the one for
sum_duplicates=True. Then just state clearly in the docstring what
that behaviour is.

> Another question is whether we want to put this in the COO->CSR (and
> CSC) conversions.  At this point, I think COO->CSR should *always* sum
> duplicates together and we should instead provide a separate function
> or member function of coo_matrix that provides additional options,
> like 'last', 'max', etc.  In general, any binary operator (T,T) -> T
> could be used as an accumulator, but we would provide the most common
> options.
This seems fine, but I don't in general like modal options as they
tend to be bug-prone. Maybe a separate member of coo_matrix called
'merge_duplicates' which would apply some operation in-place on
coo_matrix where the user could specify 'sum', 'max', 'first', 'last',
etc.

Thanks,
James


More information about the SciPy-user mailing list