[Scipy-tickets] [SciPy] #1583: mstats.chisquare and stats.chisquare documentation is out of date
SciPy Trac
scipy-tickets@scipy....
Thu Jan 12 10:58:37 CST 2012
#1583: mstats.chisquare and stats.chisquare documentation is out of date
-------------------------+--------------------------------------------------
Reporter: dloewenherz | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone: Unscheduled
Component: scipy.stats | Version: devel
Keywords: |
-------------------------+--------------------------------------------------
Comment(by warren.weckesser):
Replying to [ticket:1583 dloewenherz]:
> See
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.chisquare.html
and
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html
>
> It indicates you can specify `ddof`, the degrees of freedom, for the
p-value of the chi square test. For stats.chisquare, ddof seems to have no
affect.
Take another look at the docstring. `ddof` is the *adjustment* to the
default degrees of freedom, which is the number of observations minus 1
(written `k-1` in the docstring, but unfortunately without mentioning what
`k` is).
This is the source code for stats.chisquare:
{{{
f_obs = asarray(f_obs)
k = len(f_obs)
if f_exp is None:
f_exp = array([np.sum(f_obs,axis=0)/float(k)] * len(f_obs),float)
f_exp = f_exp.astype(float)
chisq = np.add.reduce((f_obs-f_exp)**2 / f_exp)
return chisq, chisqprob(chisq, k-1-ddof)
}}}
This shows that if you pass in a two-dimensional array, each column is
treated as a separate set of observations. For example:
{{{
In [7]: obs1 = [1,2,3]
In [8]: obs2 = [4,5,4]
In [9]: chisquare(obs1)
Out[9]: (1.0, 0.60653065971263342)
In [10]: chisquare(obs2)
Out[10]: (0.15384615384615388, 0.92596107864231603)
In [11]: m = array([obs1,obs2]).T
In [12]: m
Out[12]:
array([[1, 4],
[2, 5],
[3, 4]])
In [13]: chisquare(m)
Out[13]: (array([ 1. , 0.15384615]), array([ 0.60653066,
0.92596108]))
}}}
Note that the values in `chisquare(m)` are the same as those of
`chisquare(obs1)` and `chisquare(obs2)`.
When given an n-dimensional array, it treats each one-dimensional slice of
the first dimensional as a separate set of observations. E.g. if you give
an array of shape (3,4,5), you'll get back two arrays of shape (4,5).
This might seem like a feature, but since it is not documented, it could
be considered a bug.
What would be better is to document this feature, and also add an `axis`
keyword to the function, so you can choose the axis along which the
calculation is performed.
--
Ticket URL: <http://projects.scipy.org/scipy/ticket/1583#comment:5>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list