# [Numpy-discussion] calculating the mean and variance of a large float vector

Bruce Southey bsouthey@gmail....
Fri Jun 6 09:16:32 CDT 2008

```Bruce Southey wrote:
> Alan McIntyre wrote:
>> On Thu, Jun 5, 2008 at 10:16 PM, Keith Goodman <kwgoodman@gmail.com>
>> wrote:
>>
>>> How can that lead to instability? If the last half-million values are
>>> small then they won't have a big impact on the mean even if they are
>>> ignored. The variance is a mean too (of the squares), so it should be
>>> stable too. Or am I, once again, missing the point?
>>>
>>
>> No, I just didn't think about it long enough, and I shouldn't have
>> tried to make an example off the cuff. ;)   After thinking about it
>> again, I think some loss of accuracy is probably the worst that can
>> happen.
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> Any problems are going to mainly due to the distribution of numbers
> especially if there are very small numbers and very large numbers.
> This is mitigated by numerical precision and algorithm - my guess is
> that it will take a rather extreme case to cause you any problems.
>
> Python and NumPy are already using high numerical precision (may
> depend on architecture)  and NumPy defines 32-bit, 64-bit and 128-bit
> precision if you want to go higher (or lower). This means that
> calculations are rather insensitive to numbers used so typically there
> is no reason for any concern (ignoring the old Pentium FDIV bug,
> http://en.wikipedia.org/wiki/Pentium_FDIV_bug ).
>
> The second issue is the algorithm where you need to balance
> performance with precision. For simple calculations:
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
>
> Bruce
>