# [SciPy-User] SPARSE matrix dtypes, upcasting, sum

Fri Sep 9 10:43:51 CDT 2011

```David
Our operations are y <- Ax where A is a binary sparse matrix (hence the uint8) but x and and the result y are float vectors.  The binary sparse matrix saves memory but is it really efficient if the resulting operation upcasts the result to a float?
Dinesh

Message: 4
Date: Thu, 8 Sep 2011 09:28:48 -0400
From: David Cournapeau <cournape@gmail.com>
Subject: Re: [SciPy-User] SPARSE matrix dtypes, upcasting, sum
function
To: SciPy Users List <scipy-user@scipy.org>
Message-ID:
<CAGY4rcXPj_1Ccn_=DK939gx5BTAkvvCtA_RKnTmjOEP5HuL-BA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Thu, Sep 8, 2011 at 8:35 AM, Dinesh B Vadhia
> We have:
>
> I > 250000, J > 250000, nnz>10000000
>
> data = scipy.ones(nnz, dtype=numpy.uint8)
> A?= sparse.csr_matrix((data, (xrow, xcolumn)), shape=(I,J))
>
> where xrow and xcolumn are int vectors of length nnz
>
> The row and column sums are:
> rowsum?= A.sum(0)
> columnsum = A.sum(1)
>
> The max value given for each by Scipy are:
> rowsum?.max() = 255
> columnsum .max() = 255
>
> But, the real values are:
> rowsum?.max() = 41190
> columnsum .max() = 1080
>
> Can someone see what we are doing wrong?

It is at least a documentation bug, and I would have expected
upcasting as well. Note however that using integer will always have
some potential overflow issues, which are platform dependent (because
the default upcasting rules will use different sizes on different
platforms).

For example:

import numpy as np
a = 1024 * np.ones((4e6, 2), dtype=np.int16)
a.sum(0)

will give you the right answer on a 64 bits python on mac os x, but
the wrong one on 32 bits. As soon as you are doing operations which
can potentially overflow, I would advise to convert to float values.

cheers,

David

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110909/b85f2dd4/attachment.html
```