[Numpy-discussion] calculating the mean and variance of a large float vector

Bruce Southey bsouthey@gmail....
Fri Jun 6 09:16:32 CDT 2008

Bruce Southey wrote:
> Alan McIntyre wrote:
>> On Thu, Jun 5, 2008 at 10:16 PM, Keith Goodman <kwgoodman@gmail.com> 
>> wrote:
>>> How can that lead to instability? If the last half-million values are
>>> small then they won't have a big impact on the mean even if they are
>>> ignored. The variance is a mean too (of the squares), so it should be
>>> stable too. Or am I, once again, missing the point?
>> No, I just didn't think about it long enough, and I shouldn't have
>> tried to make an example off the cuff. ;)   After thinking about it
>> again, I think some loss of accuracy is probably the worst that can
>> happen.
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> Any problems are going to mainly due to the distribution of numbers 
> especially if there are very small numbers and very large numbers. 
> This is mitigated by numerical precision and algorithm - my guess is 
> that it will take a rather extreme case to cause you any problems.
> Python and NumPy are already using high numerical precision (may 
> depend on architecture)  and NumPy defines 32-bit, 64-bit and 128-bit 
> precision if you want to go higher (or lower). This means that 
> calculations are rather insensitive to numbers used so typically there 
> is no reason for any concern (ignoring the old Pentium FDIV bug, 
> http://en.wikipedia.org/wiki/Pentium_FDIV_bug ).
> The second issue is the algorithm where you need to balance 
> performance with precision. For simple calculations:
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
> Bruce
I forgot to add:
import numpy
x = 1e305 * numpy.ones(10000000, np.float128)
type(x[0]) #  gives <type 'numpy.float128'>
x.mean()  # gives 1.000000000000036542e+305


More information about the Numpy-discussion mailing list