# [Numpy-discussion] Does np.std() make two passes through the data?

Keith Goodman kwgoodman@gmail....
Mon Nov 22 12:39:31 CST 2010

On Mon, Nov 22, 2010 at 10:32 AM,  <josef.pktd@gmail.com> wrote:
> On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
>>
>>> @cython.boundscheck(False)
>>> @cython.wraparound(False)
>>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof):
>>>    "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>>>    cdef Py_ssize_t i
>>>    cdef int a0 = a.shape[0], count = 0
>>>    cdef np.float64_t asum = 0, a2sum=0, amean, ai, da
>>>    for i in range(a0):
>>>        ai = a[i]
>>>        if ai == ai:
>>>            asum += ai
>>>            count += 1
>>>    if count > 0:
>>>        amean = asum / count
>>>        asum = 0
>>>        for i in range(a0):
>>>            ai = a[i]
>>>            if ai == ai:
>>>                da = ai - amean
>>>                asum += da
>>>                a2sum += (da * da)
>>>        asum = asum * asum
>>>        return sqrt((a2sum - asum / count) / (count - ddof))
>>>    else:
>>>        return np.float64(NAN)
>>
>> This is 5% faster:
>>
>> @cython.boundscheck(False)
>> @cython.wraparound(False)
>> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof):
>>    "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>>    cdef Py_ssize_t i
>>    cdef int a0 = a.shape[0], count = 0
>>    cdef np.float64_t asum = 0, amean, ai
>>    for i in range(a0):
>>        ai = a[i]
>>        if ai == ai:
>>            asum += ai
>>            count += 1
>>    if count > 0:
>>        amean = asum / count
>>        asum = 0
>>        for i in range(a0):
>>            ai = a[i]
>>            if ai == ai:
>>                ai -= amean
>>                asum += (ai * ai)
>>        return sqrt(asum / (count - ddof))
>>    else:
>>        return np.float64(NAN)
>
> I think it would be better to write nanvar instead of nanstd and take
> the square root only in a delegating nanstd, instead of the other way
> around. (Also a change that should be made in scipy.stats)

Yeah, I noticed that numpy does that. I was planning to have separate
var and std functions. Here's why (from the readme file, but maybe I
should template it, the sqrt automatically converts large ddof to
NaN):

Under the hood Nanny uses a separate Cython function for each
combination of ndim, dtype, and axis. A lot of the overhead in
ny.nanmax, for example, is in checking that your axis is within range,
converting non-array data to an array, and selecting the function to
use to calculate nanmax.

You can get rid of the overhead by doing all this before you, say,
enter an inner loop:

>>> arr = np.random.rand(10,10)
>>> axis = 0
>>> func, a = ny.func.nanmax_selector(arr, axis)
>>> func.__name__
'nanmax_2d_float64_axis0'

Let's see how much faster than runs:

>> timeit np.nanmax(arr, axis=0)
10000 loops, best of 3: 25.7 us per loop
>> timeit ny.nanmax(arr, axis=0)
100000 loops, best of 3: 5.25 us per loop
>> timeit func(a)
100000 loops, best of 3: 2.5 us per loop

Note that func is faster than the Numpy's non-nan version of max:

>> timeit arr.max(axis=0)
100000 loops, best of 3: 3.28 us per loop

So adding NaN protection to your inner loops has a negative cost!