[Numpy-discussion] please change mean to use dtype=float

Christopher Barker Chris.Barker at noaa.gov
Fri Sep 22 11:34:42 CDT 2006

Tim Hochberg wrote:
> It would probably be nice to expose the 
> Kahan sum and maybe even the raw_kahan_sum somewhere.

What about using it for .sum() by default? What is the speed hit anyway? 
In any case, having it available would be nice.

> I'm on the fence on using the array dtype for the accumulator dtype 
> versus always using at least double precision for the accumulator. The 
> former is easier to explain and is probably faster, but the latter is a 
> lot more accuracy for basically free.

I don't think the difficulty of explanation is a big issue at all -- I'm 
having a really hard time imagining someone getting confused and/or 
disappointed that their single precision calculation didn't overflow or 
was more accurate than expected. Anyone that did, would know enough to 
understand the explanation.

In general, users expect things to "just work". They only dig into the 
details when something goes wrong, like the mean of a bunch of positive 
integers coming out as negative (wasn't that the OP's example?). The 
fewer such instance we have, the fewer questions we have.

> speeds shake out I suppose. If the speed of using float64 is comparable 
> to that of using float32, we might as well.

Only testing will tell, but my experience is that with double precision 
FPUs, doubles are just as fast as floats unless you're dealing with 
enough memory to make a difference in caching. In this case, only the 
accumulator is double, so that wouldn't be an issue. I suppose the float 
to double conversion could take some time though...

> One thing I'm not on the 
> fence about is the return type: it should always match the input type, 
> or dtype if that is specified.


> Since numpy-scalars are 
> generally the results of indexing operations, not literals, I think that 
> they should behave like arrays for purposes of determining the resulting 
> precision, not like python-scalars.


>> Of course the accuracy is pretty bad at single precision, so 
>> the possible, theoretical speed advantage at large sizes probably 
>> doesn't matter.

good point. the larger the array -- the more accuracy matters.


Christopher Barker, Ph.D.
NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov

More information about the Numpy-discussion mailing list