[Numpy-discussion] calculating the mean and variance of a large float vector

Charles R Harris charlesr.harris@gmail....
Thu Jun 5 21:20:13 CDT 2008


On Thu, Jun 5, 2008 at 7:55 PM, Alan McIntyre <alan.mcintyre@gmail.com>
wrote:

> On Thu, Jun 5, 2008 at 9:06 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> > On Thu, Jun 5, 2008 at 4:54 PM, Christopher Marshall
> > Are you worried that the mean might overflow on the intermediate sum?
>
> I suspect (but please correct me if I'm wrong, Christopher) he's
> asking whether there's cases where small variations in the contents of
> the vector can produce relatively large changes in the value given as
> the mean or variance.  This is a wild guess, but if the intermediate
> sums are large enough, you could have a situation where (for example)
> the last half-million values aren't counted in the intermediate sum
> because they're too small relative to the intermediate sum.  (I hope
> my numerics prof from last year doesn't read this list...I should
> really have no trouble figuring out the condition number for mean/var
> :).
>
> What kinds of values are in your vectors, Christopher?  If nobody has
> a sure answer for stability of mean/var, I'll see if I can figure it
> out.
>

If it is a real concern, add adjacent pairs, then add adjacent results, so
on and so forth. This is basically computing the DC value using an fft. For
that extra fillip, add the two smallest, put the sum back in the list, add
the two new smallest, so on and so forth. But several million values isn't
that many when done in double precision. The real question is: how much
accuracy is needed? Then design the algorithm to fit the need.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080605/f2f22b04/attachment.html 


More information about the Numpy-discussion mailing list