[Numpy-discussion] sum and mean methods behaviour

Todd Miller jmiller at stsci.edu
Tue Sep 2 11:34:04 CDT 2003


On Mon, 2003-09-01 at 05:34, Peter Verveer wrote:
> Hi All,
> 
> I noticed that the sum() and mean() methods of numarrays use the precision of 
> the given array in their calculations. That leads to resuls like this:
> 
> >>> array([255, 255], Int8).sum()
> -2
> >>> array([255, 255], Int8).mean()
> -1.0
> 
> Would it not be better to use double precision internally and return the 
> correct result?
> 
> Cheers, Peter
> 

Hi Peter,

I thought about this a lot yesterday and today talked it over with
Perry.   There are several ways to fix the problem with mean() and
sum(), and I'm hoping that you and the rest of the community will help
sort them out.

(1) The first "solution" is to require users to do their own up-casting
prior to calling mean() or sum().  This gives the end user fine control
over storage cost but leaves the C-like pitfall/bug you discovered.   I
mention this because this is how the numarray/Numeric reductions are
designed.  Is there a reason why the numarray/Numeric reductions don't
implicitly up-cast? 

(2) The second way is what you proposed:  use double precision within
mean and sum.  This has great simplicity but gives no control over
storage usage, and as implemented, the storage would be much higher than
one might think, potentially 8x.

(3) Lastly, Perry suggested a more radical approach:  rather than
changing the mean and sum methods themselves,  we could alter the
universal function accumulate and reduce methods to implicitly use
additional precision.  Perry's idea was to make all accumulations and
reductions up-cast their results to the largest type of the current
family, either Bool, Int64, Float64, or Complex64.   By doing this, we
can improve the utility of the reductions and accumulations as well as
fixing the problem with sum and mean.

-- 
Todd Miller 			jmiller at stsci.edu
STSCI / ESS / SSB





More information about the Numpy-discussion mailing list