[Numpy-discussion] Does np.std() make two passes through the data?
Sun Nov 21 20:33:57 CST 2010
On Sun, Nov 21, 2010 at 5:56 PM, Robert Kern <email@example.com> wrote:
> On Sun, Nov 21, 2010 at 19:49, Keith Goodman <firstname.lastname@example.org> wrote:
>> But this sample gives a difference:
>>>> a = np.random.rand(100)
>> As you know, I'm trying to make a drop-in replacement for
>> scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in
>> part. Either that, or suck it up and store the damn mean.
> The difference is less than eps. Quite possibly, the one-pass version
> is even closer to the true value than the two-pass version.
Good, it passes the Kern test.
Here's an even more robust estimate:
>> var(a - a.mean())
Which is better, numpy's two pass or the one pass on-line method?
NumPy error: 9.31135e-18
Nanny error: 6.5745e-18 <-- One pass wins!
numpy = 0
nanny = 0
for i in range(n):
a = np.random.rand(10)
truth = var(a - a.mean())
numpy += np.absolute(truth - a.var())
nanny += np.absolute(truth - var(a))
print 'NumPy error: %g' % (numpy / n)
print 'Nanny error: %g' % (nanny / n)
More information about the NumPy-Discussion