[SciPy-User] Sparse Matrices, summing columns !=sum

gabe.g@m... gabe.g@m...
Thu Nov 15 11:19:40 CST 2012


Hi all, I'm new to this list. Have a specific question about Sparse matrices.

I posted this to StackOverflow and Anony-Mousse suggested it might be a bug and that I should post to this list:

So-- is this a bug or am I doing something wrong?
Thanks much.
Gabe


I have a sparse matrix that I arrived at through a complicated bunch of calculations which I cannot reproduce here. I will try to find a simpler example of this.

For now, does anyone know how it might be (even remotely) possible that I could have a sparse matrix X with the property that:

In [143]: X.sum(0).sum()
Out[143]: 131138

In [144]: X.sum()
Out[144]: 327746

In [145]: X.sum(1).sum()
Out[145]: 327746

In [146]: type(X)
Out[146]: scipy.sparse.csr.csr_matrix
My only guess is that if I want to sum columns correctly, I need to first cast the matrix as csc -- which makes sense. Although one would think that the sparse package would handle column sums gracefully (or throw an error) instead of just giving a WRONG answer.

After more thought, I tried the following:

In [164]: X.tocsr().sum(0).sum()
Out[164]: 131138

In [165]: X.tocsc().sum(0).sum()
Out[165]: 131138

In [166]: X.tocoo().sum(0).sum()
Out[166]: 131138

In [167]: X.tolil().sum(0).sum()
Out[167]: 131138

In [168]: X.todok().sum(0).sum()
Out[168]: 131138

In [169]: X.shape
Out[169]: (196980, 43)

In [170]: X
Out[170]: 
<196980x43 sparse matrix of type '<type 'numpy.uint16'>'
        with 70875 stored elements in Compressed Sparse Row format>

In [172]: X.todense().sum(0)
Out[172]: 
matrix([[170726,   1041, 117398,   3526,  13202,   3585,   2355,   1895,   1392,   2189,   2070,   2603,   1676,    496,   1194,    933,    129,
            529,    544,    256,      7,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,
              0,      0,      0,      0,      0,      0,      0,      0,      0]], dtype=uint64)

In [173]: X.sum(0)
Out[173]: 
matrix([[39654,  1041, 51862,  3526, 13202,  3585,  2355,  1895,  1392,  2189,  2070,  2603,  1676,   496,  1194,   933,   129,   529,   544,   256,
             7,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0]], dtype=uint16)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20121115/2a8bbe49/attachment-0001.html 


More information about the SciPy-User mailing list