[SciPy-Dev] stats.nanstd interface
Wed Jun 16 11:26:25 CDT 2010
On Wed, Jun 16, 2010 at 10:17 AM, Bruce Southey <email@example.com> wrote:
> On 06/16/2010 09:20 AM, firstname.lastname@example.org wrote:
>> On Wed, Jun 16, 2010 at 10:02 AM, Bruce Southey<email@example.com>
>>> On 06/16/2010 07:55 AM, Angus McMorland wrote:
>>>> Hi all,
>>>> I've just updated the docstring for scipy.stats.nanstd to the new
>>>> docstring standard's format. I wonder if, for consistency of
>>>> interface, we should consider changing it to use a `ddof` parameter,
>>>> as numpy's std function does, instead of its current `bias` boolean
>>>> parameter. I'm aware that there are deprecation/API implications
>>>> associated with this, but I'm not sure what the specifics of those
>>> Please file a ticket for it.
>>> Can you please add all the differences between the signature between
>>> numpy's version and this version?
>>> In particular, the default axis of stats.nanstd is zero compared to None.
>>> It also lacks the dtype argument.
>> default axis in scipy.stats is zero not None as in numpy.
>> np.nansum has no dtype argument, nans can be only in float (I never
>> checked complex for this), so I don't know whether dtype would be
>> useful in this case.
> From np.std docstring:
> dtype : dtype, optional
> Type to use in computing the standard deviation. For arrays of
> integer type the default is float64, for arrays of float types it is
> the same as the array type.
>>> Really the function needs at least a rewrite unless numpy can provide
>>> same functionality.
>> Can you be more specific, we just rewrote axis handling
>> I think switching to ddof is a good idea. (FYI: I cannot work on
>> anything for another two weeks).
> I know that the broadcasting is not correct in the following but I do not
> know how to fix it.
> Also, np.nansum does not accept the dtype so need to convert the input to
> the new precision.
> I would like it to handle other array subtypes or at least fail to work on
> inputs like masked arrays, Matrix class etc.
> Perhaps something like this works:
> import numpy as np
> import scipy.stats as stats
> def nanstd(x, axis=None, dtype=None, ddof=0):
> if dtype == np.float128: #only convert if desired input is better than
> the default float64 dtype
> x=np.array(x, dtype=dtype)
> denom=np.isfinite(x).sum(axis=axis) # number of finite numbers
> mean=np.nansum(x, axis=axis)/denom # This is not correct because the
> broadcasting is wrong for axis >0
> diff=a-mean # a minus the mean - which must broadcast correctly
> return np.sqrt(np.nansum(diff*diff, axis=axis)/(denom-ddof))
> a=np.array([[1,2,3], [4, np.nan, 5], [6, 7, np.nan]])
> print 'stdnan=:', stdnan(a, axis=None), 'stats.nanstd=:',
> stats.nanstd(a,axis=None, bias=1)
> print 'stdnan=:', stdnan(a, axis=None, ddof=1), 'stats.nanstd=:',
> stats.nanstd(a,axis=None, bias=0)
> print 'stdnan=:', stdnan(a, axis=0), 'stats.nanstd=:',
> stats.nanstd(a,axis=0, bias=1)
> print 'stdnan=:', stdnan(a, axis=0, ddof=1), 'stats.nanstd=:',
> stats.nanstd(a,axis=0, bias=0)
> print 'The following is wrong because the broadcasting is not correct when
> computing the difference'
> print 'stdnan=:', stdnan(a, axis=1), 'stats.nanstd=:',
> stats.nanstd(a,axis=1, bias=1)
Thanks Angus for the ticket 1200:
I added code to the ticket that I think fixes the broadcasting issue I
mentioned above and added 'support' for masked array input. Also I
created the variance function as standard deviation is the square root
I really think that all these stats 'nan functions' probably could
just be converted into masked arrays and using the appropriate masked
array functions instead of creating separate functions. This would
also address how to handle the 'out' argument.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 1820 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-dev/attachments/20100616/0239491c/attachment.obj
More information about the SciPy-Dev