[Scipy-tickets] [SciPy] #1203: stats: chi-square test of independence
SciPy Trac
scipy-tickets@scipy....
Thu Jun 17 00:06:10 CDT 2010
#1203: stats: chi-square test of independence
----------------------------------------------+-----------------------------
Reporter: warren.weckesser | Owner: somebody
Type: enhancement | Status: new
Priority: normal | Milestone: 0.8.0
Component: scipy.stats | Version: 0.7.0
Keywords: chi-square chisquare chi-squared |
----------------------------------------------+-----------------------------
Old description:
> The attached file, chisquare_nway.py, includes the function
> chisquare_nway() that computes a chi-square test of independence for an
> n-dimensional array. I think this would be a nice enhancement for
> scipy.stats. A discussion on this topic (and an early, limited version
> of the function) can be found here:
>
> http://mail.scipy.org/pipermail/scipy-dev/2010-June/014538.html
>
> Inspired by that discussion, I generalized the function, added the
> optional Yates' correction for continuity, and figured out how to do the
> equivalent calculation in R for comparison.
>
> For now, I have attached just a standalone python file. After getting
> some feedback (especially about the API), I'll create
> a patch containing the code, tests, and updates to the module
> docs and release notes.
>
> Some additional comment about the code:
>
> I have included the degrees of freedom and the table of expected
> frequencies in the output. This is convenient for comparing to R, and
> they are just handy to have available.
>
> I implemented the Yates correction for continuity, but it is only
> allowed when the degrees of freedom is 1. Everything I have read
> seems to suggest that the correction is for this case only, but I
> have not dug very deeply. In particular, I haven't looked up the
> original reference.
>
> As far as I can tell, R's chisq.test does not handle three-way
> or higher tests. chisq.test does a one-way test of goodness of
> fit, or a two-way test of independence. So I would like to
> emphasize that chisquare_nway is *not* an attempt to clone the
> R function chisq.test. chisquare_nway does not do the 'one-way'
> goodness of fit test; use stats.chisquare for that.
>
> The file chisq4x3x2.r contains R code that *does* do a three-way
> test. This code prints the following:
> {{{
> Call: xtabs(formula = count ~ r + c + t)
> Number of cases in table: 478
> Number of factors: 3
> Test for independence of all factors:
> Chisq = 102.17, df = 17, p-value = 3.514e-14
> }}}
>
> The equivalent calculation using chisquare_nway:
> {{{
> >>> data = np.array(
> [[[12, 34, 23],
> [35, 31, 11],
> [12, 32, 9],
> [12, 12, 14]],
> [[ 4, 47, 11],
> [34, 10, 18],
> [18, 13, 19],
> [ 9, 33, 25]]])
> >>> chisquare_nway(data)
> (102.17314893322093,
> 3.5141225742891105e-14,
> 17,
> array([[[ 18.48003361, 28.80711122, 17.66473801],
> [ 19.60858528, 30.56632412, 18.74350064],
> [ 14.53010276, 22.64986607, 13.88906882],
> [ 14.81224068, 23.0896693 , 14.15875948]],
>
> [[ 18.79193291, 29.29330719, 17.96287705],
> [ 19.93953187, 31.08221145, 19.05984664],
> [ 14.77533657, 23.03214229, 14.12348348],
> [ 15.06223631, 23.47936836, 14.39772588]]]))
> }}}
>
> Similarly, chisq2x2x2x2.r prints:
> {{{
> Call: xtabs(formula = data ~ r + c + d + t)
> Number of cases in table: 262
> Number of factors: 4
> Test for independence of all factors:
> Chisq = 8.758, df = 11, p-value = 0.6442
> }}}
>
> This is the same data as the second example in the docstring.
> chisquare_nway matches R to the precision printed by R.
New description:
The attached file, chisquare_nway.py, includes the function
chisquare_nway() that computes a chi-square test of independence for an
n-dimensional array. I think this would be a nice enhancement for
scipy.stats. A discussion on this topic (and an early, limited version of
the function) can be found here:
http://mail.scipy.org/pipermail/scipy-dev/2010-June/014538.html
Inspired by that discussion, I generalized the function, added the
optional Yates' correction for continuity, and figured out how to do the
equivalent calculation in R for comparison.
For now, I have attached just a standalone python file. After getting
some feedback (especially about the API), I'll create
a patch containing the code, tests, and updates to the module
docs and release notes.
Some additional comment about the code:
I have included the degrees of freedom and the table of expected
frequencies in the output. This is convenient for comparing to R, and
they are just handy to have available.
I implemented the Yates correction for continuity, but it is only
allowed when the degrees of freedom is 1. Everything I have read
seems to suggest that the correction is for this case only, but I
have not dug very deeply. In particular, I haven't looked up the
original reference.
As far as I can tell, R's chisq.test does not handle three-way
or higher tests. chisq.test does a one-way test of goodness of
fit, or a two-way test of independence. So I would like to
emphasize that chisquare_nway is *not* an attempt to clone the
R function chisq.test. chisquare_nway does not do the 'one-way'
goodness of fit test; use stats.chisquare for that.
The file chisq4x3x2.r contains R code that *does* do a three-way
test. This code prints the following:
{{{
Call: xtabs(formula = data ~ r + c + t)
Number of cases in table: 478
Number of factors: 3
Test for independence of all factors:
Chisq = 102.17, df = 17, p-value = 3.514e-14
}}}
The equivalent calculation using chisquare_nway:
{{{
>>> data = np.array(
[[[12, 34, 23],
[35, 31, 11],
[12, 32, 9],
[12, 12, 14]],
[[ 4, 47, 11],
[34, 10, 18],
[18, 13, 19],
[ 9, 33, 25]]])
>>> chisquare_nway(data)
(102.17314893322093,
3.5141225742891105e-14,
17,
array([[[ 18.48003361, 28.80711122, 17.66473801],
[ 19.60858528, 30.56632412, 18.74350064],
[ 14.53010276, 22.64986607, 13.88906882],
[ 14.81224068, 23.0896693 , 14.15875948]],
[[ 18.79193291, 29.29330719, 17.96287705],
[ 19.93953187, 31.08221145, 19.05984664],
[ 14.77533657, 23.03214229, 14.12348348],
[ 15.06223631, 23.47936836, 14.39772588]]]))
}}}
Similarly, chisq2x2x2x2.r prints:
{{{
Call: xtabs(formula = data ~ r + c + d + t)
Number of cases in table: 262
Number of factors: 4
Test for independence of all factors:
Chisq = 8.758, df = 11, p-value = 0.6442
}}}
This is the same data as the second example in the docstring.
chisquare_nway matches R to the precision printed by R.
--
Comment(by warren.weckesser):
Fixed R output--the original was output from an older version of the R
code.
--
Ticket URL: <http://projects.scipy.org/scipy/ticket/1203#comment:1>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list