[Numpy-discussion] Silent overflow of Int32 array
Todd Miller
jmiller at stsci.edu
Sun Apr 10 07:25:08 CDT 2005
On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote:
> I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit
> Linux):
>
> >>> import Numeric as N
> >>> a = N.array((2000000000,1000000000),typecode=N.Int32)
> >>> N.add.reduce(a)
> -1294967296
>
> OK, it is an elementary mistake, but the silent overflow caught me
> unawares. casting the array to Float64 before summing it avoids the
> error, but in my instance the actual data is a rank-1 array of 21
> million integers with a mean value of about 140 (which adds up more than
> sys.maxint), and casting to Float64 will use quite a lot of memory (as
> well as taking some time).
>
> Any advice for catching or avoiding such overflow without always
> incurring a performance and memory hit by always casting to Float64?
Here's what numarray does:
>>> import numarray as N
>>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>>> N.add.reduce(a)
-1294967296
So basic reductions in numarray have the same "careful while you're
shaving" behavior as Numeric; it's fast but easy to screw up.
But:
>>> a.sum()
3000000000L
>>> a.sum(type='d')
3000000000.0
a.sum() blockwise upcasts to the largest type of kind on the fly, in
this case, Int64. This avoids the storage overhead of typecasting the
entire array.
A better name for the method would have been sumall() since it sums all
elements of a multi-dimensional array. The flattening process reduces
on one dimension before flattening preventing a full copy of a
discontiguous array. It could be smarter about choosing the dimension
of the initial reduction.
Regards,
Todd
More information about the Numpy-discussion
mailing list