[SciPy-dev] RFR: Proposed fixes in scipy.stats functions for calculation of variance/error/etc.

Bruce Southey bsouthey@gmail....
Mon Oct 26 11:39:15 CDT 2009

On Mon, Oct 26, 2009 at 12:32 AM, Ariel Rokem <arokem@berkeley.edu> wrote:
> Hi Pierre,
> I agree - let's see how we work things out for stats and then, we can
> copy over whatever behavior we settle on to mstats as well.
> Cheers,
> Ariel
> On Sun, Oct 25, 2009 at 10:18 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
>> On Oct 26, 2009, at 12:59 AM, josef.pktd@gmail.com wrote:
>>> zs was the list version for the zscore using z to calculate, the
>>> translation in
>>> the next changeset is correct only for 1d or raveled arrays, but it
>>> is missing
>>> an axis argument. It looks like z was a helper function for a scalar
>>> score.
>>> zmap got imported in this form in revision 71.
>>> stats.mstats has the same functions, but they look like literal
>>> translations
>>> since they have the same (ambiguous) treatment of axis if it's not 1d.
>>> stats.mstats.z has ddof=1, the others ddof=0
>> well, maybe it's time to start cleaning up mstats. For the z
>> functions, that should be straightforward, provided we don't lose the
>> mask with np.asarray (a np.asanyarray would be sufficient). In that
>> case, we could probably drop support for them in mstats. At least, we
>> should make sure that the mstats versions have the same defaults as
>> the stats ones.
>> _______________________________________________
>> Scipy-dev mailing list
>> Scipy-dev@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
> --
> Ariel Rokem
> Helen Wills Neuroscience Institute
> University of California, Berkeley
> http://argentum.ucbso.berkeley.edu/ariel
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev

There is no difference between stats.z, stats.zs and stats.zmap in
whether or not the input array is masked or not. So there is no need
to have different versions for every type of ndarray. So please create
a single version that works with at least the 'standard' array and
masked array.

I would agree to have a standardization function such as stats.zs() to
return the standardized value or normal deviate or z-score function.
But this need to allow standardization along the given axis.

But I am not sure the utility of the other two functions as
stats.zmap() is an extension of stats.z().

Ignoring axis, all these functions are of the form:
(input-constant)/standard deviation
value is a array-like input that can be converted into an array so can
be a scalar or array
constant is some constant used to center the input. It can be the mean
of the input or some user-supplied value such as a scalar or computed
from an array-like input.
standard deviation is some constant to 'standardize' the input (not
sure of the correct terminology). It can be the standard deviation of
the input or some user-supplied value such as a scalar or computed
from an array-like input.

So current three functions are:
 stats.zs(a): input='a', constant=a.mean() and standard deviation =a.std()
 stats.z(a, score): input=score, constant=a.mean() and standard
deviation =a.std()
 stats.zmap(scores, compare): input=score, constant=compare.mean() and
standard deviation =compare.std()

Assuming the axis works correctly, perhaps you can do something like this:

def zscore(a, constant=None, stddev=None, axis=0, ddof=1):
    a=np.asarray(a) # I prefer this over a=asarray(a) to ensure that a
is an ndarray or otherwise convert it to an array
    if constant==None:
        constant=np.mean(constant, axis=axis)
    if stddev==None:
        if  constant==None:
            stddev=a.std(axis=axis, ddof=1)
            stddev=np.std(constant, axis=axis, ddof=1)
    elif len(stddev) > 1: # Inappropriate but requires something in
case user wants to use the std of a different array-like input
        stddev=np.std(stddev, axis=axis, ddof=1)
    return (a-constant)/stddev


More information about the Scipy-dev mailing list