[Numpy-discussion] calculating the mean and variance of a large float vector
Fri Jun 6 09:16:32 CDT 2008
Bruce Southey wrote:
> Alan McIntyre wrote:
>> On Thu, Jun 5, 2008 at 10:16 PM, Keith Goodman <firstname.lastname@example.org>
>>> How can that lead to instability? If the last half-million values are
>>> small then they won't have a big impact on the mean even if they are
>>> ignored. The variance is a mean too (of the squares), so it should be
>>> stable too. Or am I, once again, missing the point?
>> No, I just didn't think about it long enough, and I shouldn't have
>> tried to make an example off the cuff. ;) After thinking about it
>> again, I think some loss of accuracy is probably the worst that can
>> Numpy-discussion mailing list
> Any problems are going to mainly due to the distribution of numbers
> especially if there are very small numbers and very large numbers.
> This is mitigated by numerical precision and algorithm - my guess is
> that it will take a rather extreme case to cause you any problems.
> Python and NumPy are already using high numerical precision (may
> depend on architecture) and NumPy defines 32-bit, 64-bit and 128-bit
> precision if you want to go higher (or lower). This means that
> calculations are rather insensitive to numbers used so typically there
> is no reason for any concern (ignoring the old Pentium FDIV bug,
> http://en.wikipedia.org/wiki/Pentium_FDIV_bug ).
> The second issue is the algorithm where you need to balance
> performance with precision. For simple calculations:
I forgot to add:
x = 1e305 * numpy.ones(10000000, np.float128)
type(x) # gives <type 'numpy.float128'>
x.mean() # gives 1.000000000000036542e+305
More information about the Numpy-discussion