[SciPy-User] Sparse Matrices, summing columns !=sum

Lars Buitinck L.J.Buitinck@uva...
Fri Nov 16 09:08:46 CST 2012


2012/11/16  <scipy-user-request@scipy.org>:
> Subject: [SciPy-User] Sparse Matrices, summing columns !=sum
> To: scipy-user@scipy.org
>
> I have a sparse matrix that I arrived at through a complicated bunch of calculations which I cannot reproduce here. I will try to find a simpler example of this.
>
> For now, does anyone know how it might be (even remotely) possible that I could have a sparse matrix X with the property that:
>
> In [143]: X.sum(0).sum()
> Out[143]: 131138

I tried this in SciPy 0.7.2 by constructing X from the dense matrix
you posted below, your result for X.todense().sum(0). So

>>> X
matrix([[39654,  1041, 51862,  3526, 13202,  3585,  2355,  1895,  1392,
          2189,  2070,  2603,  1676,   496,  1194,   933,   129,   529,
           544,   256,     7,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0]], dtype=uint16)
>>> Xs = csr_matrix(Xs)
>>> Xs
<1x43 sparse matrix of type '<type 'numpy.uint16'>'
	with 21 stored elements in Compressed Sparse Row format>

With this, I get

>>> Xs.sum(axis=1)
matrix([[66]], dtype=uint16)
>>> X.sum(axis=1)
matrix([[131138]], dtype=uint64)
>>> Xs.toarray().sum(axis=1)
array([131138], dtype=uint64)

So it would seem that csr_matrix.sum tries to retain the dtype for the
sum, overflowing np.uint16 halfway through the computation, while
np.matrix.sum picks a larger integer type. One solution/workaround
would be to use a larger dtype.

I haven't tried this on more recent SciPy.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam


More information about the SciPy-User mailing list