[SciPy-Dev] stats.nanstd interface
Bruce Southey
bsouthey@gmail....
Wed Jun 16 10:17:42 CDT 2010
On 06/16/2010 09:20 AM, josef.pktd@gmail.com wrote:
> On Wed, Jun 16, 2010 at 10:02 AM, Bruce Southey<bsouthey@gmail.com> wrote:
>
>> On 06/16/2010 07:55 AM, Angus McMorland wrote:
>>
>>> Hi all,
>>>
>>> I've just updated the docstring for scipy.stats.nanstd to the new
>>> docstring standard's format. I wonder if, for consistency of
>>> interface, we should consider changing it to use a `ddof` parameter,
>>> as numpy's std function does, instead of its current `bias` boolean
>>> parameter. I'm aware that there are deprecation/API implications
>>> associated with this, but I'm not sure what the specifics of those
>>> are.
>>>
>>> Angus.
>>>
>>>
>> Please file a ticket for it.
>> Can you please add all the differences between the signature between
>> numpy's version and this version?
>> In particular, the default axis of stats.nanstd is zero compared to None.
>> It also lacks the dtype argument.
>>
> default axis in scipy.stats is zero not None as in numpy.
> np.nansum has no dtype argument, nans can be only in float (I never
> checked complex for this), so I don't know whether dtype would be
> useful in this case.
>
From np.std docstring:
"
dtype : dtype, optional
Type to use in computing the standard deviation. For arrays of
integer type the default is float64, for arrays of float types
it is
the same as the array type.
"
>
>> Really the function needs at least a rewrite unless numpy can provide
>> same functionality.
>>
> Can you be more specific, we just rewrote axis handling
>
> I think switching to ddof is a good idea. (FYI: I cannot work on
> anything for another two weeks).
>
> Josef
>
>
I know that the broadcasting is not correct in the following but I do
not know how to fix it.
Also, np.nansum does not accept the dtype so need to convert the input
to the new precision.
I would like it to handle other array subtypes or at least fail to work
on inputs like masked arrays, Matrix class etc.
Perhaps something like this works:
import numpy as np
import scipy.stats as stats
def nanstd(x, axis=None, dtype=None, ddof=0):
if dtype == np.float128: #only convert if desired input is better
than the default float64 dtype
x=np.array(x, dtype=dtype)
denom=np.isfinite(x).sum(axis=axis) # number of finite numbers
mean=np.nansum(x, axis=axis)/denom # This is not correct because
the broadcasting is wrong for axis >0
diff=a-mean # a minus the mean - which must broadcast correctly
return np.sqrt(np.nansum(diff*diff, axis=axis)/(denom-ddof))
a=np.array([[1,2,3], [4, np.nan, 5], [6, 7, np.nan]])
print 'stdnan=:', stdnan(a, axis=None), 'stats.nanstd=:',
stats.nanstd(a,axis=None, bias=1)
print 'stdnan=:', stdnan(a, axis=None, ddof=1), 'stats.nanstd=:',
stats.nanstd(a,axis=None, bias=0)
print 'stdnan=:', stdnan(a, axis=0), 'stats.nanstd=:',
stats.nanstd(a,axis=0, bias=1)
print 'stdnan=:', stdnan(a, axis=0, ddof=1), 'stats.nanstd=:',
stats.nanstd(a,axis=0, bias=0)
print 'The following is wrong because the broadcasting is not correct
when computing the difference'
print 'stdnan=:', stdnan(a, axis=1), 'stats.nanstd=:',
stats.nanstd(a,axis=1, bias=1)
Bruce
More information about the SciPy-Dev
mailing list