# [Numpy-discussion] Does np.std() make two passes through the data?

Keith Goodman kwgoodman@gmail....
Sun Nov 21 19:49:54 CST 2010

```On Sun, Nov 21, 2010 at 4:18 PM,  <josef.pktd@gmail.com> wrote:
> On Sun, Nov 21, 2010 at 6:43 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>> Does np.std() make two passes through the data?
>>
>> Numpy:
>>
>>>> arr = np.random.rand(10)
>>>> arr.std()
>>   0.3008736260967052
>>
>> Looks like an algorithm that makes one pass through the data (one for
>> loop) wouldn't match arr.std():
>>
>>>> np.sqrt((arr*arr).mean() - arr.mean()**2)
>>   0.30087362609670526
>>
>> But a slower two-pass algorithm would match arr.std():
>>
>>>> np.sqrt(((arr - arr.mean())**2).mean())
>>   0.3008736260967052
>>
>> Is there a way to get the same result as arr.std() in one pass (cython
>> for loop) of the data?
>
> reference several times pointed to on the list is the wikipedia page, e.g.
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm

Unfortunately it doesn't give the same result as numpy's two pass algorithm.

>From wikipedia:

def var(arr):
n = 0
mean = 0
M2 = 0
for x in arr:
n = n + 1
delta = x - mean
mean = mean + delta / n
M2 = M2 + delta * (x - mean)
return M2 / n

This set of random samples gives matching variance:

>> a = np.random.rand(100)
>> a.var()
0.07867478939716277
>> var(a)
0.07867478939716277

But this sample gives a difference:

>> a = np.random.rand(100)
>> a.var()
0.080232196646619805
>> var(a)
0.080232196646619791

As you know, I'm trying to make a drop-in replacement for
scipy.stats.nanstd. Maybe I'll have to add an asterisk to the drop-in
part. Either that, or suck it up and store the damn mean.
```