[SciPy-Dev] chi-square test for a contingency (R x C) table

Warren Weckesser warren.weckesser@enthought....
Thu Jun 17 09:41:21 CDT 2010


Bruce Southey wrote:
> On 06/16/2010 11:58 PM, Warren Weckesser wrote:
>   
>> The feedback in this thread inspired me to generalize my original code
>> to the n-way test of independence.  I have attached the revised code to
>> a new ticket:
>>
>>      http://projects.scipy.org/scipy/ticket/1203
>>
>> More feedback would be great!
>>
>> Warren
>>
>>
>>    
>>     
> The handling for a one way table is wrong:
>  >>>print 'One way', chisquare_nway([6, 2])
> (0.0, 1.0, 0, array([ 6.,  2.]))
>
> It should also do the marginal independence tests.
>   

As I explained in the description of the ticket and in the docstring, 
this function is not intended for doing the 'one-way' goodness of fit.  
stats.chisquare should be used for that.  Calling chisquare_nway with a 
1D array amounts to doing a test of independence between groupings but 
only giving a single grouping, hence the trivial result.  This is 
intentional.

I guess the question is: should there be a "clever" chi-square function 
that figures out what the user probably wants to do?


> I would have expected the conversion of the input into an array in the 
> chisquare_nway function.  If the input is is not an array, then there is 
> a potential bug waiting to happen because you expect numpy to correctly 
> compute the observed minus expected. For example, if the input is a list 
> then it relies on numpy doing a list minus a ndarray.  It is also 
> inefficient in the sense that you have to convert the input twice (once 
> for the expected values and once for the observed minus expected 
> calculation.


I was going to put in something like table = np.asarray(table), but then 
I noticed that, since `expected` had already been converted to an array, 
the calculation worked even if `table` was a list.  E.g.

In [4]: chisquare_nway([[10,10],[5,25]])
Out[4]:
(6.3492063492063489,
 0.011743382301172606,
 1,
 array([[  6.,  14.],
       [  9.,  21.]]))

But I will put in the conversion--that will make it easier to do a few 
other sanity checks on the input before trying to do any calculations.

>  You can also get interesting errors with a string input 
> where the reason may not be obvious:
>
>  >>>print 'twoway', chisquare_nway([['6', '2'], ['4', '11']])
>    File "chisquare_nway.py", line 132, in chisquare_nway
>      chi2 = ((table - expected)**2 / expected).sum()
> TypeError: unsupported operand type(s) for -: 'list' and 'numpy.ndarray'
>
>   
> I don't recall how np.asarray handles very large numbers but I would 
> also suggest an optional dtype argument instead of forcing float64 dtype:
> "table = np.asarray(table, dtype=np.float64)"
>
>   

Sure, I can add that.

> In expected_nway(), you could prestore a variable with the  'range(d)' 
> although the saving is little for small tables.
> Also, I would like to remove the usage of set() in the loop.
> If k=2:
>
>  >>> list(set(range(d))-set([k]))
> [0, 1, 3, 4]
>  >>> rd=range(5) #which would be outside the loop
>  >>> [ elem for elem in rd if elem != k ]
> [0, 1, 3, 4]
>
>   

Looks good--I'll make that change.


> Bruce
>
>
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>   



More information about the SciPy-Dev mailing list