[SciPy-User] Sparse Matrices, summing columns !=sum
Lars Buitinck
L.J.Buitinck@uva...
Fri Nov 16 09:08:46 CST 2012
2012/11/16 <scipy-user-request@scipy.org>:
> Subject: [SciPy-User] Sparse Matrices, summing columns !=sum
> To: scipy-user@scipy.org
>
> I have a sparse matrix that I arrived at through a complicated bunch of calculations which I cannot reproduce here. I will try to find a simpler example of this.
>
> For now, does anyone know how it might be (even remotely) possible that I could have a sparse matrix X with the property that:
>
> In [143]: X.sum(0).sum()
> Out[143]: 131138
I tried this in SciPy 0.7.2 by constructing X from the dense matrix
you posted below, your result for X.todense().sum(0). So
>>> X
matrix([[39654, 1041, 51862, 3526, 13202, 3585, 2355, 1895, 1392,
2189, 2070, 2603, 1676, 496, 1194, 933, 129, 529,
544, 256, 7, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0]], dtype=uint16)
>>> Xs = csr_matrix(Xs)
>>> Xs
<1x43 sparse matrix of type '<type 'numpy.uint16'>'
with 21 stored elements in Compressed Sparse Row format>
With this, I get
>>> Xs.sum(axis=1)
matrix([[66]], dtype=uint16)
>>> X.sum(axis=1)
matrix([[131138]], dtype=uint64)
>>> Xs.toarray().sum(axis=1)
array([131138], dtype=uint64)
So it would seem that csr_matrix.sum tries to retain the dtype for the
sum, overflowing np.uint16 halfway through the computation, while
np.matrix.sum picks a larger integer type. One solution/workaround
would be to use a larger dtype.
I haven't tried this on more recent SciPy.
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
More information about the SciPy-User
mailing list