[Numpy-discussion] please change mean to use dtype=float
David M. Cooke
cookedm at physics.mcmaster.ca
Wed Sep 20 11:34:17 CDT 2006
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
> Let me offer a third path: the algorithms used for .mean() and .var() are
> substandard. There are much better incremental algorithms that entirely avoid
> the need to accumulate such large (and therefore precision-losing) intermediate
> values. The algorithms look like the following for 1D arrays in Python:
>
> def mean(a):
> m = a[0]
> for i in range(1, len(a)):
> m += (a[i] - m) / (i + 1)
> return m
This isn't really going to be any better than using a simple sum.
It'll also be slower (a division per iteration). You do avoid
accumulating large sums, but then doing the division a[i]/len(a) and
adding that will do the same.
Now, if you want to avoid losing precision, you want to use a better
summation technique, like compensated (or Kahan) summation:
def mean(a):
s = e = a.dtype.type(0)
for i in range(0, len(a)):
temp = s
y = a[i] + e
s = temp + y
e = (temp - s) + y
return s / len(a)
Some numerical experiments in Maple using 5-digit precision show that
your mean is maybe a bit better in some cases, but can also be much
worse, than sum(a)/len(a), but both are quite poor in comparision to the
Kahan summation.
(We could probably use a fast implementation of Kahan summation in
addition to a.sum())
> def var(a):
> m = a[0]
> t = a.dtype.type(0)
> for i in range(1, len(a)):
> q = a[i] - m
> r = q / (i+1)
> m += r
> t += i * q * r
> t /= len(a)
> return t
>
> Alternatively, from Knuth:
>
> def var_knuth(a):
> m = a.dtype.type(0)
> variance = a.dtype.type(0)
> for i in range(len(a)):
> delta = a[i] - m
> m += delta / (i+1)
> variance += delta * (a[i] - m)
> variance /= len(a)
> return variance
These formulas are good when you can only do one pass over the data
(like in a calculator where you don't store all the data points), but
are slightly worse than doing two passes. Kahan summation would probably
also be good here too.
--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca
More information about the Numpy-discussion
mailing list