[SciPy-Dev] chi-square test for a contingency (R x C) table
Wed Jun 2 13:41:47 CDT 2010
On Wed, Jun 2, 2010 at 2:18 PM, Neil Martinsen-Burrell <firstname.lastname@example.org> wrote:
> On 2010-06-02 13:10 , Bruce Southey wrote:
>>>> However, this code is the chi-squared test part as SAS will compute the
>>>> actual cell numbers. Also an extension to scipy.stats.chisquare() so we
>>>> can not have both functions.
>>> Again, I don't understand what you mean that we can't have both
>>> functions? I believe (from a statistics teacher's point of view) that
>>> the Chi-Squared goodness of fit test (which is stats.chisquare) is a
>>> different beast from the Chi-Square test for independence (which is
>>> stats.chisquare_contingency). The fact that the distribution of the
>>> test statistic is the same should not tempt us to put them into the
>>> same function.
>> Please read scipy.stats.chisquare() because scipy.stats.chisquare() is
>> the 1-d case of yours.
>> Quote from the docstring:
>> " The chi square test tests the null hypothesis that the categorical data
>> has the given frequencies."
>> Also go the web site provided in the docstring.
>> By default you get the expected frequencies but you can also put in your
>> own using the f_exp variable. You could do the same in your code.
> In fact, Warren correctly used stats.chisquare with the expected
> frequencies calculated from the null hypothesis and the corrected
> degrees of freedom. chisquare_contingency is in some sense a
> convenience method for taking care of these pre-calculations before
> calling stats.chisquare. Can you explain more clearly to me why we
> should not include such a convenience function?
Just a clarification, before I find time to work my way through the
stats.chisquare is a generic test for goodness-of-fit for discreted or
and from the docstring of it
"If no expected frequencies are given, the total
N is assumed to be equally distributed across all groups."
default is uniform distribution
chisquare_twoway is a special case that additional calculates the
correct expected frequencies for the test of independencs based on the
margin totals. The resulting distribution is not uniform.
I agree with Neil that this is a very useful convenience function.
I never heard of a one-way contingency table, my question was whether
the function should also handle 3-way or 4-way tables, additional to
I thought about the question how the input should be specified for my
initial response, the alternative would be to use the original data or
a "long" format instead of a table. But I thought that as a
convenience function using the table format will be the most common
I have written in the past functions that calculate the contingency
table, and would be very useful to have a more complete coverage of
tools to work with contingency tables in scipy.stats (or temporarily
in statsmodels, where we are working also on the anova type of
So, I think the way it is it is a nice function and we don't have to
put all contingency table analysis into this function.
>>>> Really this should be combined with fisher.py in ticket 956:
>>> Wow, apparently I have lots of disagreements today, but I don't think
>>> that this should be combined with Fisher's Exact test. (I would like
>>> to see that ticket mature to the point where it can be added to
>>> scipy.stats.) I like the functions in scipy.stats to correspond in a
>>> one-to-one manner with the statistical tests. I think that the docs
>>> should "See Also" the appropriate exact (and non-parametric) tests,
>>> but I think that one function/one test is a good rule. This is
>>> particularly true for people (like me) who would like to someday be
>>> able to use scipy.stats in a pedagogical context.
>> I don't see any 'disagreements' rather just different ways to do things
>> and identifying areas that need to be addressed for more general use.
> Agreed. :)
> SciPy-Dev mailing list
More information about the SciPy-Dev