# [Numpy-discussion] please change mean to use dtype=float

Charles R Harris charlesr.harris at gmail.com
Fri Sep 22 09:43:53 CDT 2006

```Hi,

On 9/22/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
>
> Tim Hochberg wrote:
> > David M. Cooke wrote:
> >
> >> On Thu, 21 Sep 2006 11:34:42 -0700
> >> Tim Hochberg <tim.hochberg at ieee.org> wrote:
> >>
> >>
> >>
> >>> Tim Hochberg wrote:
> >>>
> >>>
> >>>> Robert Kern wrote:
> >>>>
> >>>>
> >>>>
> >>>>> David M. Cooke wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Let me offer a third path: the algorithms used for .mean() and
> .var()
> >>>>>>> are substandard. There are much better incremental algorithms that
> >>>>>>> entirely avoid the need to accumulate such large (and therefore
> >>>>>>> precision-losing) intermediate values. The algorithms look like
> the
> >>>>>>> following for 1D arrays in Python:
> >>>>>>>
> >>>>>>> def mean(a):
> >>>>>>>      m = a[0]
> >>>>>>>      for i in range(1, len(a)):
> >>>>>>>          m += (a[i] - m) / (i + 1)
> >>>>>>>      return m
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> This isn't really going to be any better than using a simple sum.
> >>>>>> It'll also be slower (a division per iteration).
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> With one exception, every test that I've thrown at it shows that
> it's
> >>>>> better for float32. That exception is uniformly spaced arrays, like
> >>>>> linspace().
> >>>>>
> >>>>>  > You do avoid
> >>>>>  > accumulating large sums, but then doing the division a[i]/len(a)
> and
> >>>>>  > adding that will do the same.
> >>>>>
> >>>>> Okay, this is true.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Now, if you want to avoid losing precision, you want to use a
> better
> >>>>>> summation technique, like compensated (or Kahan) summation:
> >>>>>>
> >>>>>> def mean(a):
> >>>>>>     s = e = a.dtype.type(0)
> >>>>>>     for i in range(0, len(a)):
> >>>>>>         temp = s
> >>>>>>         y = a[i] + e
> >>>>>>         s = temp + y
> >>>>>>         e = (temp - s) + y
> >>>>>>     return s / len(a)
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>> def var(a):
> >>>>>>>      m = a[0]
> >>>>>>>      t = a.dtype.type(0)
> >>>>>>>      for i in range(1, len(a)):
> >>>>>>>          q = a[i] - m
> >>>>>>>          r = q / (i+1)
> >>>>>>>          m += r
> >>>>>>>          t += i * q * r
> >>>>>>>      t /= len(a)
> >>>>>>>      return t
> >>>>>>>
> >>>>>>> Alternatively, from Knuth:
> >>>>>>>
> >>>>>>> def var_knuth(a):
> >>>>>>>      m = a.dtype.type(0)
> >>>>>>>      variance = a.dtype.type(0)
> >>>>>>>      for i in range(len(a)):
> >>>>>>>          delta = a[i] - m
> >>>>>>>          m += delta / (i+1)
> >>>>>>>          variance += delta * (a[i] - m)
> >>>>>>>      variance /= len(a)
> >>>>>>>      return variance
> >>>>>>>
> >>>>>>>
> >>> I'm going to go ahead and attach a module containing the versions of
> >>> mean, var, etc that I've been playing with in case someone wants to
> mess
> >>> with them. Some were stolen from traffic on this list, for others I
> >>> grabbed the algorithms from wikipedia or equivalent.
> >>>
> >>>
> >> I looked into this a bit more. I checked float32 (single precision) and
> >> float64 (double precision), using long doubles (float96) for the
> "exact"
> >> results. This is based on your code. Results are compared using
> >> abs(exact_stat - computed_stat) / max(abs(values)), with 10000 values
> in the
> >> range of [-100, 900]
> >>
> >> First, the mean. In float32, the Kahan summation in single precision is
> >> better by about 2 orders of magnitude than simple summation. However,
> >> accumulating the sum in double precision is better by about 9 orders of
> >> magnitude than simple summation (7 orders more than Kahan).
> >>
> >> In float64, Kahan summation is the way to go, by 2 orders of magnitude.
> >>
> >> For the variance, in float32, Knuth's method is *no better* than the
> two-pass
> >> method. Tim's code does an implicit conversion of intermediate results
> to
> >> float64, which is why he saw a much better result.
> >>
> > Doh! And I fixed that same problem in the mean implementation earlier
> > too. I was astounded by how good knuth was doing, but not astounded
> > enough apparently.
> >
> > Does it seem weird to anyone else that in:
> >     numpy_scalar <op> python_scalar
> > the precision ends up being controlled by the python scalar? I would
> > expect the numpy_scalar to control the resulting precision just like
> > numpy arrays do in similar circumstances. Perhaps the egg on my face is
> > just clouding my vision though.
> >
> >
> >> The two-pass method using
> >> Kahan summation (again, in single precision), is better by about 2
> orders of
> >> magnitude. There is practically no difference when using a
> double-precision
> >> accumulator amongst the techniques: they're all about 9 orders of
> magnitude
> >> better than single-precision two-pass.
> >>
> >> In float64, Kahan summation is again better than the rest, by about 2
> orders
> >> of magnitude.
> >>
> >> I've put my adaptation of Tim's code, and box-and-whisker plots of the
> >> results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
> >>
> >> Conclusions:
> >>
> >> - If you're going to calculate everything in single precision, use
> Kahan
> >> summation. Using it in double-precision also helps.
> >> - If you can use a double-precision accumulator, it's much better than
> any of
> >> the techniques in single-precision only.
> >>
> >> - for speed+precision in the variance, either use Kahan summation in
> single
> >> precision with the two-pass method, or use double precision with simple
> >> summation with the two-pass method. Knuth buys you nothing, except
> slower
> >> code :-)
> >>
> >>
> > The two pass methods are definitely more accurate. I won't be convinced
> > on the speed front till I see comparable C implementations slug it out.
> > That may well mean never in practice. However, I expect that somewhere
> > around 10,000 items, the cache will overflow and memory bandwidth will
> > become the bottleneck. At that point the extra operations of Knuth won't
> > matter as much as making two passes through the array and Knuth will win
> > on speed.
> OK, let me take this back. I looked at this speed effect a little more
> and the effect is smaller than I remember. For example, "a+=a" slows
> down by about a factor of 2.5 on my box between 10,000 and 100,0000
> elements. That's not insignificant, but assuming (naively) that this
> means that memory effects account to the equivalent of 1.5 add
> operations, this doesn't come close to closing to gap between Knuth and
> the two pass approach. The other thing is that while a+=a and similar
> operations show this behaviour, a.sum() and add.reduce(a) do not,
> hinting that perhaps it's only writing to memory that is a bottleneck.
> Or perhaps hinting that my mental model of what's going on here is badly
> flawed.
>
> So, +1 for me on Kahan summation for computing means and two pass with
> Kahan summation for variances. It would probably be nice to expose the
> Kahan sum and maybe even the raw_kahan_sum somewhere. I can see use for
> the latter in summing a bunch of disparate matrices with high precision.
>
> I'm on the fence on using the array dtype for the accumulator dtype
> versus always using at least double precision for the accumulator.

I keep going back and forth on this myself. So I ask myself what I do in
practice. In practice, I accumulate in doubles, sometimes extended
precision. I only think of accumulating in floats when the number of items
is small and computation might be noticeably faster -- and these days it
seldom is. The days of the Vax and a 2x speed difference are gone, now we
generally have FPUs whose base type is double with float tacked on as an
afterthought. The only lesson from those days that remains pertinent is:
never, ever, do division.

So in practice, I would go with doubles. The only good reason to use floats
for floats is a sort of pure consistency.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20060922/5eaf1bba/attachment-0001.html
```