[Numpy-discussion] Does np.std() make two passes through the data?
josef.pktd@gmai...
josef.pktd@gmai...
Mon Nov 22 12:32:37 CST 2010
On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
>
>> @cython.boundscheck(False)
>> @cython.wraparound(False)
>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof):
>> "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>> cdef Py_ssize_t i
>> cdef int a0 = a.shape[0], count = 0
>> cdef np.float64_t asum = 0, a2sum=0, amean, ai, da
>> for i in range(a0):
>> ai = a[i]
>> if ai == ai:
>> asum += ai
>> count += 1
>> if count > 0:
>> amean = asum / count
>> asum = 0
>> for i in range(a0):
>> ai = a[i]
>> if ai == ai:
>> da = ai - amean
>> asum += da
>> a2sum += (da * da)
>> asum = asum * asum
>> return sqrt((a2sum - asum / count) / (count - ddof))
>> else:
>> return np.float64(NAN)
>
> This is 5% faster:
>
> @cython.boundscheck(False)
> @cython.wraparound(False)
> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof):
> "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
> cdef Py_ssize_t i
> cdef int a0 = a.shape[0], count = 0
> cdef np.float64_t asum = 0, amean, ai
> for i in range(a0):
> ai = a[i]
> if ai == ai:
> asum += ai
> count += 1
> if count > 0:
> amean = asum / count
> asum = 0
> for i in range(a0):
> ai = a[i]
> if ai == ai:
> ai -= amean
> asum += (ai * ai)
> return sqrt(asum / (count - ddof))
> else:
> return np.float64(NAN)
I think it would be better to write nanvar instead of nanstd and take
the square root only in a delegating nanstd, instead of the other way
around. (Also a change that should be made in scipy.stats)
Josef
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list