[Scipy-tickets] [SciPy] #1203: stats: chi-square test of independence

SciPy Trac scipy-tickets@scipy....
Wed Jun 16 23:50:53 CDT 2010


#1203: stats: chi-square test of independence
----------------------------------------------+-----------------------------
 Reporter:  warren.weckesser                  |       Owner:  somebody
     Type:  enhancement                       |      Status:  new     
 Priority:  normal                            |   Milestone:  0.8.0   
Component:  scipy.stats                       |     Version:  0.7.0   
 Keywords:  chi-square chisquare chi-squared  |  
----------------------------------------------+-----------------------------
 The attached file, chisquare_nway.py, includes the function
 chisquare_nway() that computes a chi-square test of independence for an
 n-dimensional array.  I think this would be a nice enhancement for
 scipy.stats.  A discussion on this topic (and an early, limited version of
 the function) can be found here:

     http://mail.scipy.org/pipermail/scipy-dev/2010-June/014538.html

 Inspired by that discussion, I generalized the function, added the
 optional Yates' correction for continuity, and figured out how to do the
 equivalent calculation in R for comparison.

 For now, I have attached just a standalone python file.  After getting
 some feedback (especially about the API), I'll create
 a patch containing the code, tests, and updates to the module
 docs and release notes.

 Some additional comment about the code:

 I have included the degrees of freedom and the table of expected
 frequencies in the output.  This is convenient for comparing to R, and
 they are just handy to have available.

 I implemented the Yates correction for continuity, but it is only
 allowed when the degrees of freedom is 1.  Everything I have read
 seems to suggest that the correction is for this case only, but I
 have not dug very deeply. In particular, I haven't looked up the
 original reference.

 As far as I can tell, R's chisq.test does not handle three-way
 or higher tests.  chisq.test does a one-way test of goodness of
 fit, or a two-way test of independence.  So I would like to
 emphasize that chisquare_nway is *not* an attempt to clone the
 R function chisq.test.  chisquare_nway does not do the 'one-way'
 goodness of fit test; use stats.chisquare for that.

 The file chisq4x3x2.r contains R code that *does* do a three-way
 test.  This code prints the following:
 {{{
 Call: xtabs(formula = count ~ r + c + t)
 Number of cases in table: 478
 Number of factors: 3
 Test for independence of all factors:
         Chisq = 102.17, df = 17, p-value = 3.514e-14
 }}}

 The equivalent calculation using chisquare_nway:
 {{{
 >>> data = np.array(
     [[[12, 34, 23],
       [35, 31, 11],
       [12, 32,  9],
       [12, 12, 14]],
      [[ 4, 47, 11],
       [34, 10, 18],
       [18, 13, 19],
       [ 9, 33, 25]]])
 >>> chisquare_nway(data)
 (102.17314893322093,
  3.5141225742891105e-14,
  17,
  array([[[ 18.48003361,  28.80711122,  17.66473801],
         [ 19.60858528,  30.56632412,  18.74350064],
         [ 14.53010276,  22.64986607,  13.88906882],
         [ 14.81224068,  23.0896693 ,  14.15875948]],

        [[ 18.79193291,  29.29330719,  17.96287705],
         [ 19.93953187,  31.08221145,  19.05984664],
         [ 14.77533657,  23.03214229,  14.12348348],
         [ 15.06223631,  23.47936836,  14.39772588]]]))
 }}}

 Similarly, chisq2x2x2x2.r prints:
 {{{
 Call: xtabs(formula = data ~ r + c + d + t)
 Number of cases in table: 262
 Number of factors: 4
 Test for independence of all factors:
         Chisq = 8.758, df = 11, p-value = 0.6442
 }}}

 This is the same data as the second example in the docstring.
 chisquare_nway matches R to the precision printed by R.

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1203>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list