[SciPy-Dev] stats.nanstd interface

Bruce Southey bsouthey@gmail....
Wed Jun 16 10:17:42 CDT 2010

On 06/16/2010 09:20 AM, josef.pktd@gmail.com wrote:
> On Wed, Jun 16, 2010 at 10:02 AM, Bruce Southey<bsouthey@gmail.com>  wrote:
>> On 06/16/2010 07:55 AM, Angus McMorland wrote:
>>> Hi all,
>>> I've just updated the docstring for scipy.stats.nanstd to the new
>>> docstring standard's format. I wonder if, for consistency of
>>> interface, we should consider changing it to use a `ddof` parameter,
>>> as numpy's std function does, instead of its current `bias` boolean
>>> parameter. I'm aware that there are deprecation/API implications
>>> associated with this, but I'm not sure what the specifics of those
>>> are.
>>> Angus.
>> Please file a ticket for it.
>> Can you please add all the differences between the signature between
>> numpy's version and this version?
>> In particular, the default axis of stats.nanstd is zero compared to None.
>> It also lacks the dtype argument.
> default axis in scipy.stats is zero not None as in numpy.
> np.nansum has no dtype argument, nans can be only in float (I never
> checked complex for this), so I don't know whether dtype would be
> useful in this case.
 From np.std docstring:
     dtype : dtype, optional
         Type to use in computing the standard deviation. For arrays of
         integer type the default is float64, for arrays of float types 
it is
         the same as the array type.

>> Really the function needs at least a rewrite unless numpy can provide
>> same functionality.
> Can you be more specific, we just rewrote axis handling
> I think switching to ddof is a good idea. (FYI: I cannot work on
> anything for another two weeks).
> Josef
I know that the broadcasting is not correct in the following but I do 
not know how to fix it.
Also, np.nansum does not accept the dtype so need to convert the input 
to the new precision.

I would like it to handle other array subtypes or at least fail to work 
on inputs like masked arrays, Matrix class etc.

Perhaps something like this works:

import numpy as np
import scipy.stats as stats

def nanstd(x, axis=None, dtype=None, ddof=0):
    if dtype == np.float128:   #only convert if desired input is  better 
than the default float64 dtype
         x=np.array(x, dtype=dtype)
     denom=np.isfinite(x).sum(axis=axis) # number of finite numbers
     mean=np.nansum(x, axis=axis)/denom # This is not correct because 
the broadcasting is wrong for axis >0
     diff=a-mean # a minus the mean - which must broadcast correctly
     return np.sqrt(np.nansum(diff*diff, axis=axis)/(denom-ddof))

a=np.array([[1,2,3], [4, np.nan, 5], [6, 7, np.nan]])
print 'stdnan=:', stdnan(a, axis=None), 'stats.nanstd=:', 
stats.nanstd(a,axis=None, bias=1)
print 'stdnan=:', stdnan(a, axis=None, ddof=1), 'stats.nanstd=:', 
stats.nanstd(a,axis=None, bias=0)
print 'stdnan=:', stdnan(a, axis=0), 'stats.nanstd=:', 
stats.nanstd(a,axis=0, bias=1)
print 'stdnan=:', stdnan(a, axis=0, ddof=1), 'stats.nanstd=:', 
stats.nanstd(a,axis=0, bias=0)
print 'The following is wrong because the broadcasting is not correct 
when computing the difference'
print 'stdnan=:', stdnan(a, axis=1), 'stats.nanstd=:', 
stats.nanstd(a,axis=1, bias=1)


More information about the SciPy-Dev mailing list