# [Numpy-discussion] please change mean to use dtype=float

Charles R Harris charlesr.harris at gmail.com
Fri Sep 22 13:14:22 CDT 2006

```On 9/22/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
>
> Sebastian Haase wrote:
> > On Thursday 21 September 2006 15:28, Tim Hochberg wrote:
> >
> >> David M. Cooke wrote:
> >>
> >>> On Thu, 21 Sep 2006 11:34:42 -0700
> >>>
> >>> Tim Hochberg <tim.hochberg at ieee.org> wrote:
> >>>
> >>>> Tim Hochberg wrote:
> >>>>
> >>>>> Robert Kern wrote:
> >>>>>
> >>>>>> David M. Cooke wrote:
> >>>>>>
> >>>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
> >>>>>>>
> >>>>>>>> Let me offer a third path: the algorithms used for .mean() and
> >>>>>>>> .var() are substandard. There are much better incremental
> algorithms
> >>>>>>>> that entirely avoid the need to accumulate such large (and
> therefore
> >>>>>>>> precision-losing) intermediate values. The algorithms look like
> the
> >>>>>>>> following for 1D arrays in Python:
> >>>>>>>>
> >>>>>>>> def mean(a):
> >>>>>>>>      m = a[0]
> >>>>>>>>      for i in range(1, len(a)):
> >>>>>>>>          m += (a[i] - m) / (i + 1)
> >>>>>>>>      return m
> >>>>>>>>
> >>>>>>> This isn't really going to be any better than using a simple sum.
> >>>>>>> It'll also be slower (a division per iteration).
> >>>>>>>
> >>>>>> With one exception, every test that I've thrown at it shows that
> it's
> >>>>>> better for float32. That exception is uniformly spaced arrays, like
> >>>>>> linspace().
> >>>>>>
> >>>>>>  > You do avoid
> >>>>>>  > accumulating large sums, but then doing the division a[i]/len(a)
> >>>>>>  > and adding that will do the same.
> >>>>>>
> >>>>>> Okay, this is true.
> >>>>>>
> >>>>>>
> >>>>>>> Now, if you want to avoid losing precision, you want to use a
> better
> >>>>>>> summation technique, like compensated (or Kahan) summation:
> >>>>>>>
> >>>>>>> def mean(a):
> >>>>>>>     s = e = a.dtype.type(0)
> >>>>>>>     for i in range(0, len(a)):
> >>>>>>>         temp = s
> >>>>>>>         y = a[i] + e
> >>>>>>>         s = temp + y
> >>>>>>>         e = (temp - s) + y
> >>>>>>>     return s / len(a)
> >>>>>>>
> >>>>>>>
> >>>>>>>> def var(a):
> >>>>>>>>      m = a[0]
> >>>>>>>>      t = a.dtype.type(0)
> >>>>>>>>      for i in range(1, len(a)):
> >>>>>>>>          q = a[i] - m
> >>>>>>>>          r = q / (i+1)
> >>>>>>>>          m += r
> >>>>>>>>          t += i * q * r
> >>>>>>>>      t /= len(a)
> >>>>>>>>      return t
> >>>>>>>>
> >>>>>>>> Alternatively, from Knuth:
> >>>>>>>>
> >>>>>>>> def var_knuth(a):
> >>>>>>>>      m = a.dtype.type(0)
> >>>>>>>>      variance = a.dtype.type(0)
> >>>>>>>>      for i in range(len(a)):
> >>>>>>>>          delta = a[i] - m
> >>>>>>>>          m += delta / (i+1)
> >>>>>>>>          variance += delta * (a[i] - m)
> >>>>>>>>      variance /= len(a)
> >>>>>>>>      return variance
> >>>>>>>>
> >>>> I'm going to go ahead and attach a module containing the versions of
> >>>> mean, var, etc that I've been playing with in case someone wants to
> mess
> >>>> with them. Some were stolen from traffic on this list, for others I
> >>>> grabbed the algorithms from wikipedia or equivalent.
> >>>>
> >>> I looked into this a bit more. I checked float32 (single precision)
> and
> >>> float64 (double precision), using long doubles (float96) for the
> "exact"
> >>> results. This is based on your code. Results are compared using
> >>> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values
> in
> >>> the range of [-100, 900]
> >>>
> >>> First, the mean. In float32, the Kahan summation in single precision
> is
> >>> better by about 2 orders of magnitude than simple summation. However,
> >>> accumulating the sum in double precision is better by about 9 orders
> of
> >>> magnitude than simple summation (7 orders more than Kahan).
> >>>
> >>> In float64, Kahan summation is the way to go, by 2 orders of
> magnitude.
> >>>
> >>> For the variance, in float32, Knuth's method is *no better* than the
> >>> two-pass method. Tim's code does an implicit conversion of
> intermediate
> >>> results to float64, which is why he saw a much better result.
> >>>
> >> Doh! And I fixed that same problem in the mean implementation earlier
> >> too. I was astounded by how good knuth was doing, but not astounded
> >> enough apparently.
> >>
> >> Does it seem weird to anyone else that in:
> >>     numpy_scalar <op> python_scalar
> >> the precision ends up being controlled by the python scalar? I would
> >> expect the numpy_scalar to control the resulting precision just like
> >> numpy arrays do in similar circumstances. Perhaps the egg on my face is
> >> just clouding my vision though.
> >>
> >>
> >>> The two-pass method using
> >>> Kahan summation (again, in single precision), is better by about 2
> orders
> >>> of magnitude. There is practically no difference when using a
> >>> double-precision accumulator amongst the techniques: they're all about
> 9
> >>> orders of magnitude better than single-precision two-pass.
> >>>
> >>> In float64, Kahan summation is again better than the rest, by about 2
> >>> orders of magnitude.
> >>>
> >>> I've put my adaptation of Tim's code, and box-and-whisker plots of the
> >>> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
> >>>
> >>> Conclusions:
> >>>
> >>> - If you're going to calculate everything in single precision, use
> Kahan
> >>> summation. Using it in double-precision also helps.
> >>> - If you can use a double-precision accumulator, it's much better than
> >>> any of the techniques in single-precision only.
> >>>
> >>> - for speed+precision in the variance, either use Kahan summation in
> >>> single precision with the two-pass method, or use double precision
> with
> >>> simple summation with the two-pass method. Knuth buys you nothing,
> except
> >>> slower code :-)
> >>>
> >> The two pass methods are definitely more accurate. I won't be convinced
> >> on the speed front till I see comparable C implementations slug it out.
> >> That may well mean never in practice. However, I expect that somewhere
> >> around 10,000 items, the cache will overflow and memory bandwidth will
> >> become the bottleneck. At that point the extra operations of Knuth
> won't
> >> matter as much as making two passes through the array and Knuth will
> win
> >> on speed. Of course the accuracy is pretty bad at single precision, so
> >> the possible, theoretical speed advantage at large sizes probably
> >> doesn't matter.
> >>
> >> -tim
> >>
> >>
> >>> After 1.0 is out, we should look at doing one of the above.
> >>>
> >> +1
> >>
> >>
> > Hi,
> > I don't understand much of these "fancy algorithms", but does the above
> mean
> > that
> > after 1.0 is released the float32 functions will catch up in terms of
> > precision precision ("to some extend") with using dtype=float64 ?
> >
> It sounds like there is some consensus to do something to improve the
> precision after 1.0 comes out. I don't think the details are entirely
> nailed down, but it sounds like some combination of using Kahan
> summation and increasing the minimum precision of the accumulator is
> likely for sum, mean, var and std.

I would go with double as the default except when the base precision is
greater. Higher precisions should be available as a check, i.e., if the
results look funny use Kahan or extended to see if roundoff is a problem. I
think Kahan should be available because it adds precision to *any* base
precision, even extended or quad, so there is always something to check
against. But I don't think Kahan should be the default because of the speed
hit.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20060922/a88f5746/attachment.html
```