[Numpy-discussion] Does np.std() make two passes through the data?
Keith Goodman
kwgoodman@gmail....
Mon Nov 22 12:59:51 CST 2010
On Mon, Nov 22, 2010 at 10:51 AM, <josef.pktd@gmail.com> wrote:
> On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>> On Mon, Nov 22, 2010 at 10:32 AM, <josef.pktd@gmail.com> wrote:
>>> On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>>>> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
>>>>
>>>>> @cython.boundscheck(False)
>>>>> @cython.wraparound(False)
>>>>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof):
>>>>> "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>>>>> cdef Py_ssize_t i
>>>>> cdef int a0 = a.shape[0], count = 0
>>>>> cdef np.float64_t asum = 0, a2sum=0, amean, ai, da
>>>>> for i in range(a0):
>>>>> ai = a[i]
>>>>> if ai == ai:
>>>>> asum += ai
>>>>> count += 1
>>>>> if count > 0:
>>>>> amean = asum / count
>>>>> asum = 0
>>>>> for i in range(a0):
>>>>> ai = a[i]
>>>>> if ai == ai:
>>>>> da = ai - amean
>>>>> asum += da
>>>>> a2sum += (da * da)
>>>>> asum = asum * asum
>>>>> return sqrt((a2sum - asum / count) / (count - ddof))
>>>>> else:
>>>>> return np.float64(NAN)
>>>>
>>>> This is 5% faster:
>>>>
>>>> @cython.boundscheck(False)
>>>> @cython.wraparound(False)
>>>> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof):
>>>> "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>>>> cdef Py_ssize_t i
>>>> cdef int a0 = a.shape[0], count = 0
>>>> cdef np.float64_t asum = 0, amean, ai
>>>> for i in range(a0):
>>>> ai = a[i]
>>>> if ai == ai:
>>>> asum += ai
>>>> count += 1
>>>> if count > 0:
>>>> amean = asum / count
>>>> asum = 0
>>>> for i in range(a0):
>>>> ai = a[i]
>>>> if ai == ai:
>>>> ai -= amean
>>>> asum += (ai * ai)
>>>> return sqrt(asum / (count - ddof))
>>>> else:
>>>> return np.float64(NAN)
>>>
>>> I think it would be better to write nanvar instead of nanstd and take
>>> the square root only in a delegating nanstd, instead of the other way
>>> around. (Also a change that should be made in scipy.stats)
>>
>> Yeah, I noticed that numpy does that. I was planning to have separate
>> var and std functions. Here's why (from the readme file, but maybe I
>> should template it, the sqrt automatically converts large ddof to
>> NaN):
>
> I'm not sure what you are saying, dropping the squareroot in the
> function doesn't require nan handling in the inner loop. If you want
> to return nan when count-ddof<=0, then you could just replace
>
> if count > 0:
> ...
>
> by
> if count -ddof > 0:
> ...
>
> Or am I missing the point?
Yes, sorry. Ignore the sqrt/nan comment. The point is that I want to
be able to return the underlying, low-level cython function (see
below). So I need separate var and std versions to do that (unless I
can modify default kwargs on the fly).
>> Under the hood Nanny uses a separate Cython function for each
>> combination of ndim, dtype, and axis. A lot of the overhead in
>> ny.nanmax, for example, is in checking that your axis is within range,
>> converting non-array data to an array, and selecting the function to
>> use to calculate nanmax.
>>
>> You can get rid of the overhead by doing all this before you, say,
>> enter an inner loop:
>>
>>>>> arr = np.random.rand(10,10)
>>>>> axis = 0
>>>>> func, a = ny.func.nanmax_selector(arr, axis)
>>>>> func.__name__
>> 'nanmax_2d_float64_axis0'
>>
>> Let's see how much faster than runs:
>>
>>>> timeit np.nanmax(arr, axis=0)
>> 10000 loops, best of 3: 25.7 us per loop
>>>> timeit ny.nanmax(arr, axis=0)
>> 100000 loops, best of 3: 5.25 us per loop
>>>> timeit func(a)
>> 100000 loops, best of 3: 2.5 us per loop
>>
>> Note that func is faster than the Numpy's non-nan version of max:
>>
>>>> timeit arr.max(axis=0)
>> 100000 loops, best of 3: 3.28 us per loop
>>
>> So adding NaN protection to your inner loops has a negative cost!
More information about the NumPy-Discussion
mailing list