[Scipy-tickets] [SciPy] #1583: mstats.chisquare and stats.chisquare documentation is out of date

SciPy Trac scipy-tickets@scipy....
Thu Jan 12 10:58:37 CST 2012


#1583: mstats.chisquare and stats.chisquare documentation is out of date
-------------------------+--------------------------------------------------
 Reporter:  dloewenherz  |       Owner:  somebody   
     Type:  defect       |      Status:  new        
 Priority:  normal       |   Milestone:  Unscheduled
Component:  scipy.stats  |     Version:  devel      
 Keywords:               |  
-------------------------+--------------------------------------------------

Comment(by warren.weckesser):

 Replying to [ticket:1583 dloewenherz]:
 > See
 http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.chisquare.html
 and
 http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html
 >
 > It indicates you can specify `ddof`, the degrees of freedom, for the
 p-value of the chi square test. For stats.chisquare, ddof seems to have no
 affect.

 Take another look at the docstring.  `ddof` is the *adjustment* to the
 default degrees of freedom, which is the number of observations minus 1
 (written `k-1` in the docstring, but unfortunately without mentioning what
 `k` is).


 This is the source code for stats.chisquare:
 {{{
     f_obs = asarray(f_obs)
     k = len(f_obs)
     if f_exp is None:
         f_exp = array([np.sum(f_obs,axis=0)/float(k)] * len(f_obs),float)
     f_exp = f_exp.astype(float)
     chisq = np.add.reduce((f_obs-f_exp)**2 / f_exp)
     return chisq, chisqprob(chisq, k-1-ddof)
 }}}
 This shows that if you pass in a two-dimensional array, each column is
 treated as a separate set of observations.  For example:
 {{{
 In [7]: obs1 = [1,2,3]

 In [8]: obs2 = [4,5,4]

 In [9]: chisquare(obs1)
 Out[9]: (1.0, 0.60653065971263342)

 In [10]: chisquare(obs2)
 Out[10]: (0.15384615384615388, 0.92596107864231603)

 In [11]: m = array([obs1,obs2]).T

 In [12]: m
 Out[12]:
 array([[1, 4],
        [2, 5],
        [3, 4]])

 In [13]: chisquare(m)
 Out[13]: (array([ 1.        ,  0.15384615]), array([ 0.60653066,
 0.92596108]))

 }}}
 Note that the values in `chisquare(m)` are the same as those of
 `chisquare(obs1)` and `chisquare(obs2)`.

 When given an n-dimensional array, it treats each one-dimensional slice of
 the first dimensional as a separate set of observations.  E.g. if you give
 an array of shape (3,4,5), you'll get back two arrays of shape (4,5).

 This might seem like a feature, but since it is not documented, it could
 be considered a bug.

 What would be better is to document this feature, and also add an `axis`
 keyword to the function, so you can choose the axis along which the
 calculation is performed.

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1583#comment:5>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list