[SciPy-Dev] chi-square test for a contingency (R x C) table
josef.pktd@gmai...
josef.pktd@gmai...
Wed Jun 2 00:23:16 CDT 2010
On Wed, Jun 2, 2010 at 12:28 AM, Warren Weckesser
<warren.weckesser@enthought.com> wrote:
> I've been digging into some basic statistics recently, and developed the
> following function for applying the chi-square test to a contingency
> table. Does something like this already exist in scipy.stats? If not,
> any objects to adding it? (Tests are already written :)
There is no test like this yet in scipy.stats, and I think it is a
good addition.
My main question, which maybe Bruce can answer, is whether the
function should allow more than 2 dimensions. The function would be
easy to generalize but I don't know how common the test for example
for independence in (RxCxD) is.
(Options could still be added later without changing the API, in case
there are any.) I would also look briefly at the R manual, to see what
features their test has.
(I'm not a real user of contingency tables)
The docstring I think should mention that this is a test for
independence, and that it is only appropriate if the expected count in
each cell is at least 5. (off the top of my head)
"Chi-square test for independence in a contingency (R x C) table"
is (R x C) standard notation (letters)?
dof_adjust, I would have to check.
Can you open a ticket, mainly for the record, but to see if there are
any useful generalization?
But I think it can go in.
A comment:
The function matches the pattern of the current scipy.stats functions,
but in statsmodels I would most likely also make the expected values
available, so that users can directly compare data and expected
values.
Thanks,
Josef
>
> Warren
>
> -----
>
> def chisquare_contingency(table):
> """Chi-square calculation for a contingency (R x C) table.
>
> This function computes the chi-square statistic and p-value of the
> data in the table. The expected frequencies are computed based on
> the relative frequencies in the table.
>
> Parameters
> ----------
> table : array_like, 2D
> The contingency table, also known as the R x C table.
>
> Returns
> -------
> chisquare statistic : float
> The chisquare test statistic
> p : float
> The p-value of the test.
> """
> table = np.asarray(table)
> if table.ndim != 2:
> raise ValueError("table must be a 2D array.")
>
> # Create the table of expected frequencies.
> total = table.sum()
> row_sum = table.sum(axis=1).reshape(-1,1)
> col_sum = table.sum(axis=0)
> expected = row_sum * col_sum / float(total)
>
> # Since we are passing in 1D arrays of length table.size, the default
> # number of degrees of freedom is table.size-1.
> # For a contingency table, the actual number degrees of freedom is
> # (nr - 1)*(nc-1). We use the ddof argument
> # of the chisquare function to adjust the default.
> nr, nc = table.shape
> dof = (nr - 1) * (nc - 1)
> dof_adjust = (table.size - 1) - dof
>
> chi2, p = chisquare(np.ravel(table), np.ravel(expected),
> ddof=dof_adjust)
> return chi2, p
>
> -----
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
More information about the SciPy-Dev
mailing list