[SciPy-Dev] chi-square test for a contingency (R x C) table

josef.pktd@gmai... josef.pktd@gmai...
Fri Jun 4 13:12:06 CDT 2010


On Fri, Jun 4, 2010 at 1:08 PM, Bruce Southey <bsouthey@gmail.com> wrote:
> On 06/03/2010 08:27 AM, Warren Weckesser wrote:
>>
>> Just letting you know that I'm not ignoring all the great comments from
>> josef, Neil and Bruce about my suggestion for chisquare_contingency.
>> Unfortunately, I won't have time to think about all the deeper
>> suggestions for another week or so.   For now, I'll just say that I
>> agree with josef's and Neil's suggestions for the docstring, and that
>> Neil's summary of the function as simply a convenience function that
>> calls stats.chisquare with appropriate arguments to perform a test of
>> independence on a contingency table is exactly what I had in mind.
>>
>> Warren
>>
>>
>>
>
> Hi,
> I looked at how SAS handles n-way tables. What it appears to do is break the
> original table down into a set of 2-way tables and does the analysis on each
> of these. So a 3 by 4 by 5 table is processed as three 2-way tables with the
> results of each 4 by 5 table presented. I do not know how Stata and R
> analysis analyze n-way tables.
>
> Consequently, I rewrote my suggested code (attached) to handle 3 and 4 way
> tables by using recursion. There should be some Python way to do that
> recursion for any number of dimensions. I also added the 1-way table (but
> that has a different hypothesis than the 2-way table) so users can send a
> 1-d table.

(very briefly because I don't have much time today)

I think, these are good extensions, but to handle all cases, the
function is getting too large and would need several options.

On your code and SAS, Z(correct me if my quick reading is wrong)
You seem to be calculating conditional independence for the last two
variables conditional on the values of the first variables. I think
this could be generalized to all pairwise independence tests.

Similar, I'm a bit surprised that SAS uses conditional and not
marginal independence, I would have thought that the test for marginal
independence (aggregate out all but 2 variables) would be the more
common use case.

Initially, I was thinking just about independence of all variables in
a 3 or more way table, i.e. P(x,y,z)=P(x)*P(y)*P(z)

My opinion is that these variations of tests would fit better in a
class where all pairwise conditional, and marginal and joint
hypotheses can be supplied as methods, or split it up into a group of
functions.

Thanks,

Josef

>
> The data used is from two SAS examples and I added a dimension to get a
> 4-way table. I included the SAS values but these are only to 4 decimal
> places for reference.
>
> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect029.htm
> http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_freq_sect030.htm
>
> What is missing:
> 1) Docstring and tests but those are dependent what is ultimately decided
> 2) Other test statistics but scipy.stats versions are not very friendly in
> that these do not accept a 2-d array
> 3) A way to do recursion
> 4) Ability to label the levels etc.
> 5) Correct handling of input types.
>
> Bruce
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>


More information about the SciPy-Dev mailing list