# [Scipy-tickets] [SciPy] #1203: stats: chi-square test of independence

SciPy Trac scipy-tickets@scipy....
Wed Jun 16 23:50:53 CDT 2010

```#1203: stats: chi-square test of independence
----------------------------------------------+-----------------------------
Reporter:  warren.weckesser                  |       Owner:  somebody
Type:  enhancement                       |      Status:  new
Priority:  normal                            |   Milestone:  0.8.0
Component:  scipy.stats                       |     Version:  0.7.0
Keywords:  chi-square chisquare chi-squared  |
----------------------------------------------+-----------------------------
The attached file, chisquare_nway.py, includes the function
chisquare_nway() that computes a chi-square test of independence for an
n-dimensional array.  I think this would be a nice enhancement for
scipy.stats.  A discussion on this topic (and an early, limited version of
the function) can be found here:

http://mail.scipy.org/pipermail/scipy-dev/2010-June/014538.html

Inspired by that discussion, I generalized the function, added the
optional Yates' correction for continuity, and figured out how to do the
equivalent calculation in R for comparison.

For now, I have attached just a standalone python file.  After getting
some feedback (especially about the API), I'll create
a patch containing the code, tests, and updates to the module
docs and release notes.

I have included the degrees of freedom and the table of expected
frequencies in the output.  This is convenient for comparing to R, and
they are just handy to have available.

I implemented the Yates correction for continuity, but it is only
allowed when the degrees of freedom is 1.  Everything I have read
seems to suggest that the correction is for this case only, but I
have not dug very deeply. In particular, I haven't looked up the
original reference.

As far as I can tell, R's chisq.test does not handle three-way
or higher tests.  chisq.test does a one-way test of goodness of
fit, or a two-way test of independence.  So I would like to
emphasize that chisquare_nway is *not* an attempt to clone the
R function chisq.test.  chisquare_nway does not do the 'one-way'
goodness of fit test; use stats.chisquare for that.

The file chisq4x3x2.r contains R code that *does* do a three-way
test.  This code prints the following:
{{{
Call: xtabs(formula = count ~ r + c + t)
Number of cases in table: 478
Number of factors: 3
Test for independence of all factors:
Chisq = 102.17, df = 17, p-value = 3.514e-14
}}}

The equivalent calculation using chisquare_nway:
{{{
>>> data = np.array(
[[[12, 34, 23],
[35, 31, 11],
[12, 32,  9],
[12, 12, 14]],
[[ 4, 47, 11],
[34, 10, 18],
[18, 13, 19],
[ 9, 33, 25]]])
>>> chisquare_nway(data)
(102.17314893322093,
3.5141225742891105e-14,
17,
array([[[ 18.48003361,  28.80711122,  17.66473801],
[ 19.60858528,  30.56632412,  18.74350064],
[ 14.53010276,  22.64986607,  13.88906882],
[ 14.81224068,  23.0896693 ,  14.15875948]],

[[ 18.79193291,  29.29330719,  17.96287705],
[ 19.93953187,  31.08221145,  19.05984664],
[ 14.77533657,  23.03214229,  14.12348348],
[ 15.06223631,  23.47936836,  14.39772588]]]))
}}}

Similarly, chisq2x2x2x2.r prints:
{{{
Call: xtabs(formula = data ~ r + c + d + t)
Number of cases in table: 262
Number of factors: 4
Test for independence of all factors:
Chisq = 8.758, df = 11, p-value = 0.6442
}}}

This is the same data as the second example in the docstring.
chisquare_nway matches R to the precision printed by R.

--
Ticket URL: <http://projects.scipy.org/scipy/ticket/1203>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.
```