[SciPy-dev] RFR: Proposed fixes in scipy.stats functions for calculation of variance/error/etc.

josef.pktd@gmai... josef.pktd@gmai...
Sun Oct 25 23:19:54 CDT 2009


On Sun, Oct 25, 2009 at 11:49 PM, Ariel Rokem <arokem@berkeley.edu> wrote:
> Hi Josef and all,
>
> thank for looking. Concerning the z-score functions - I am also
> confused by those and I would suggest unifying them under one
> function. In particular, I can't imagine what the function 'z' is for.
> However, I don't want to just remove these without discussion. What do
> you think about this?
>
> Another, more general thing, concerning the axis - I am wondering: why
> is the default axis for scipy is 0, while the default for numpy (in
> np.mean, for example) is None? I think that it would be good to have
> one convention for both libraries. I think that the more parsimonious
> one is the one using "None" as the default value. This doesn't favor
> any of the dimensions of an array over others, by default. I don't
> know - how wide-spread is this convention within scipy?

I had to run after the last message. My impression was that maybe in
one of the changes the ddof=1 got lost, i.e. the distinction that was
in scipy stats for population versus sample statistics.
z and zmap look the same to me from the intended (?) calculation
but zmap mixes up the axis arguments. (mean with "axis", std with
hardcoded axis=0). Maybe the intention will be clearer when I look
at the trac history or the original stats package.

>From looking at the three function, I would assume that the combined
function would have a signature like

def zscore(a, compare=None, axis=0, ddof=0)

or two functions, one with compare, one without ?


About default axis=0:

I think this is scipy.stats specific. We had a brief discussion a year
ago, where Jarrod agreed that default for stats should remain axis=0.

In statistics, you almost never want to ravel data, not mixing apples
and cars, or prices and quantities. So the default should be reducing
along an axis, e.g. mean over all observations by variable.

axis=0 versus axis=-1, this is traditional in statistics/econometrics. Both
from other matrix packages (gauss, matlab) and from the textbook
treatment (of books that I know). Switching to -1 for the data would
be a big mental break and would require axis translation of the
textbook formulas, e.g solve X'X beta = X'Y

>From my perspective loosing axis=0 as default is the main disadvantage
of removing mean, var, and so on, from scipy.stats. eg. I need to create
a lambda function if I want mean(x, axis=0) as a callback function.

Cheers,

Josef

>
> Cheers,
>
> Ariel
>
> On Sun, Oct 25, 2009 at 8:16 PM,  <josef.pktd@gmail.com> wrote:
>> On Sun, Oct 25, 2009 at 10:50 PM, Ariel Rokem <arokem@berkeley.edu> wrote:
>>> Hi everyone,
>>>
>>> I have been working on some fixes to the functions in scipy.stats
>>> which calculate variance/error and related quantities. In particular,
>>> in order to comply with the deprecation warnings that appear in use of
>>> scipy.stats.samplevar/scipy.stats.samplestd, I have replaced use of
>>> these functions with calls to np.std/np.var. I have also cleaned up
>>> the documentation a bit.
>>>
>>> This can all be found here: http://codereview.appspot.com/141051
>>>
>>> Cheers,
>>>
>>> Ariel
>>
>> I just gave it a quick look, looks good so far
>>
>> in  def zs  looks like a shape error for axis>0
>> "return (a-mu)/sigma"
>>
>>
>> def zs   changes definition, before it normalized with raveled mean,
>> std not by axis
>>
>> - mu = np.mean(a,None)
>> - sigma = samplestd(a)
>> - return (array(a)-mu)/sigma
>>
>> + a,axis = _chk_asarray(a,axis)
>> + mu = np.mean(a,axis)
>> + sigma = np.std(a,axis)
>> + return (a-mu)/sigma
>>
>> I never looked closely at these,
>> zmap has a description I don't understand.
>>
>> z, zs, zm  ???
>>
>> Which is which? they look a bit inconsistent, population might refer
>> to dof correction in z ?
>> Is there a standard terminology for z scores?
>>
>> I think for axis, I have seen more "int or None" ?
>>
>> Josef
>>
>>
>>
>>
>>> --
>>> Ariel Rokem
>>> Helen Wills Neuroscience Institute
>>> University of California, Berkeley
>>> http://argentum.ucbso.berkeley.edu/ariel
>>> _______________________________________________
>>> Scipy-dev mailing list
>>> Scipy-dev@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>> _______________________________________________
>> Scipy-dev mailing list
>> Scipy-dev@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>
>
>
> --
> Ariel Rokem
> Helen Wills Neuroscience Institute
> University of California, Berkeley
> http://argentum.ucbso.berkeley.edu/ariel
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


More information about the Scipy-dev mailing list