[Numpy-discussion] calculating the mean and variance of a large float vector

Keith Goodman kwgoodman@gmail....
Thu Jun 5 21:16:30 CDT 2008

On Thu, Jun 5, 2008 at 6:55 PM, Alan McIntyre <alan.mcintyre@gmail.com> wrote:
> On Thu, Jun 5, 2008 at 9:06 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>> On Thu, Jun 5, 2008 at 4:54 PM, Christopher Marshall
>> Are you worried that the mean might overflow on the intermediate sum?
> I suspect (but please correct me if I'm wrong, Christopher) he's
> asking whether there's cases where small variations in the contents of
> the vector can produce relatively large changes in the value given as
> the mean or variance.  This is a wild guess, but if the intermediate
> sums are large enough, you could have a situation where (for example)
> the last half-million values aren't counted in the intermediate sum
> because they're too small relative to the intermediate sum.  (I hope
> my numerics prof from last year doesn't read this list...I should
> really have no trouble figuring out the condition number for mean/var
> :).

How can that lead to instability? If the last half-million values are
small then they won't have a big impact on the mean even if they are
ignored. The variance is a mean too (of the squares), so it should be
stable too. Or am I, once again, missing the point?

More information about the Numpy-discussion mailing list