[SciPy-Dev] stats.nanstd interface
Wed Jun 16 10:17:42 CDT 2010
On 06/16/2010 09:20 AM, firstname.lastname@example.org wrote:
> On Wed, Jun 16, 2010 at 10:02 AM, Bruce Southey<email@example.com> wrote:
>> On 06/16/2010 07:55 AM, Angus McMorland wrote:
>>> Hi all,
>>> I've just updated the docstring for scipy.stats.nanstd to the new
>>> docstring standard's format. I wonder if, for consistency of
>>> interface, we should consider changing it to use a `ddof` parameter,
>>> as numpy's std function does, instead of its current `bias` boolean
>>> parameter. I'm aware that there are deprecation/API implications
>>> associated with this, but I'm not sure what the specifics of those
>> Please file a ticket for it.
>> Can you please add all the differences between the signature between
>> numpy's version and this version?
>> In particular, the default axis of stats.nanstd is zero compared to None.
>> It also lacks the dtype argument.
> default axis in scipy.stats is zero not None as in numpy.
> np.nansum has no dtype argument, nans can be only in float (I never
> checked complex for this), so I don't know whether dtype would be
> useful in this case.
From np.std docstring:
dtype : dtype, optional
Type to use in computing the standard deviation. For arrays of
integer type the default is float64, for arrays of float types
the same as the array type.
>> Really the function needs at least a rewrite unless numpy can provide
>> same functionality.
> Can you be more specific, we just rewrote axis handling
> I think switching to ddof is a good idea. (FYI: I cannot work on
> anything for another two weeks).
I know that the broadcasting is not correct in the following but I do
not know how to fix it.
Also, np.nansum does not accept the dtype so need to convert the input
to the new precision.
I would like it to handle other array subtypes or at least fail to work
on inputs like masked arrays, Matrix class etc.
Perhaps something like this works:
import numpy as np
import scipy.stats as stats
def nanstd(x, axis=None, dtype=None, ddof=0):
if dtype == np.float128: #only convert if desired input is better
than the default float64 dtype
denom=np.isfinite(x).sum(axis=axis) # number of finite numbers
mean=np.nansum(x, axis=axis)/denom # This is not correct because
the broadcasting is wrong for axis >0
diff=a-mean # a minus the mean - which must broadcast correctly
return np.sqrt(np.nansum(diff*diff, axis=axis)/(denom-ddof))
a=np.array([[1,2,3], [4, np.nan, 5], [6, 7, np.nan]])
print 'stdnan=:', stdnan(a, axis=None), 'stats.nanstd=:',
print 'stdnan=:', stdnan(a, axis=None, ddof=1), 'stats.nanstd=:',
print 'stdnan=:', stdnan(a, axis=0), 'stats.nanstd=:',
print 'stdnan=:', stdnan(a, axis=0, ddof=1), 'stats.nanstd=:',
print 'The following is wrong because the broadcasting is not correct
when computing the difference'
print 'stdnan=:', stdnan(a, axis=1), 'stats.nanstd=:',
More information about the SciPy-Dev