[SciPy-Dev] chi-square test for a contingency (R x C) table

josef.pktd@gmai... josef.pktd@gmai...
Wed Jun 2 00:23:16 CDT 2010


On Wed, Jun 2, 2010 at 12:28 AM, Warren Weckesser
<warren.weckesser@enthought.com> wrote:
> I've been digging into some basic statistics recently, and developed the
> following function for applying the chi-square test to a contingency
> table.  Does something like this already exist in scipy.stats? If not,
> any objects to adding it?  (Tests are already written :)

There is no test like this yet in scipy.stats, and I think it is a
good addition.

My main question, which maybe Bruce can answer, is whether the
function should allow more than 2 dimensions. The function would be
easy to generalize but I don't know how common the test for example
for independence in (RxCxD) is.
(Options could still be added later without changing the API, in case
there are any.) I would also look briefly at the R manual, to see what
features their test has.
(I'm not a real user of contingency tables)

The docstring I think should mention that this is a test for
independence, and that it is only appropriate if the expected count in
each cell is at least 5. (off the top of my head)

"Chi-square test for independence in a contingency (R x C) table"
is (R x C) standard notation (letters)?

dof_adjust, I would have to check.

Can you open a ticket, mainly for the record, but to see if there are
any useful generalization?
But I think it can go in.

A comment:
The function matches the pattern of the current scipy.stats functions,
but in statsmodels I would most likely also make the expected values
available, so that users can directly compare data and expected
values.

Thanks,

Josef

>
> Warren
>
> -----
>
> def chisquare_contingency(table):
>    """Chi-square calculation for a contingency (R x C) table.
>
>    This function computes the chi-square statistic and p-value of the
>    data in the table.  The expected frequencies are computed based on
>    the relative frequencies in the table.
>
>    Parameters
>    ----------
>    table : array_like, 2D
>        The contingency table, also known as the R x C table.
>
>    Returns
>    -------
>    chisquare statistic : float
>        The chisquare test statistic
>    p : float
>        The p-value of the test.
>    """
>    table = np.asarray(table)
>    if table.ndim != 2:
>        raise ValueError("table must be a 2D array.")
>
>    # Create the table of expected frequencies.
>    total = table.sum()
>    row_sum = table.sum(axis=1).reshape(-1,1)
>    col_sum = table.sum(axis=0)
>    expected = row_sum * col_sum / float(total)
>
>    # Since we are passing in 1D arrays of length table.size, the default
>    # number of degrees of freedom is table.size-1.
>    # For a contingency table, the actual number degrees of freedom is
>    # (nr - 1)*(nc-1).  We use the ddof argument
>    # of the chisquare function to adjust the default.
>    nr, nc = table.shape
>    dof = (nr - 1) * (nc - 1)
>    dof_adjust = (table.size - 1) - dof
>
>    chi2, p = chisquare(np.ravel(table), np.ravel(expected),
> ddof=dof_adjust)
>    return chi2, p
>
> -----
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


More information about the SciPy-Dev mailing list