# [SciPy-User] stats.chisquare issues

Bruce Southey bsouthey@gmail....
Mon Sep 27 14:41:36 CDT 2010

```  On 09/26/2010 03:17 PM, josef.pktd@gmail.com wrote:
> On Sun, Sep 26, 2010 at 3:02 PM, Gökhan Sever<gokhansever@gmail.com>  wrote:
>> Hello,
>> Consider these examples:
>> I[35]: np.histogram(ydata, bins=6)
>> O[35]:
>> (array([4, 1, 3, 0, 0, 1]),
>>   array([   2.8       ,  146.33333333,  289.86666667,  433.4       ,
>>          576.93333333,  720.46666667,  864.        ]))
>> I[36]: np.histogram(ypred, bins=6)
>> O[36]:
>> (array([4, 2, 2, 0, 0, 1]),
>>   array([  22.08895   ,  166.34439167,  310.59983333,  454.855275  ,
>>          599.11071667,  743.36615833,  887.6216    ]))
>> I[45]: stats.chisquare([4, 1, 3, 0, 0, 1], [4, 2, 2, 0, 0,
>> 1])---------------------------------------------------------------------------
>> AttributeError                            Traceback (most recent call last)
>> /home/gsever/Desktop/<ipython console>  in<module>()
>> /usr/lib/python2.6/site-packages/scipy/stats/stats.pyc in chisquare(f_obs,
>> f_exp, ddof)
>>     2516     if f_exp is None:
>>     2517         f_exp = array([np.sum(f_obs,axis=0)/float(k)] *
>> len(f_obs),float)
>> ->  2518     f_exp = f_exp.astype(float)
>>     2519     chisq = np.add.reduce((f_obs-f_exp)**2 / f_exp)
>>     2520     return chisq, chisqprob(chisq, k-1-ddof)
>> AttributeError: 'list' object has no attribute 'astype'
>> Here, I expect any scipy function including chisquare should be able to
>> handle lists???
>> ############################################
>> This one throws:
>> I[46]: stats.chisquare(np.array([4, 1, 3, 0, 0, 1]), np.array([4, 2, 2, 0,
>> 0, 1]))
>> O[46]: (nan, nan)
>> again I should be aware since the division has 0 in it.
>> I[47]: a1 = np.ma.masked_equal([4,1,3,0,0,1], 0)
>> I[48]: a2 = np.ma.masked_equal([4,2,2,0,0,1], 0)
>> Further,
>> I[49]: stats.chisquare(a1, a2)
>> O[49]: (1.0, 0.96256577324729631)
>> I[50]: stats.mstats.chisquare(a1, a2)
>> O[50]: (1.0, 0.80125195690120077)
> masking doesn't remove the values, so when you have a masked array,
> then you should use compressed or similar
>
> dropping the zero bins

You should use the masked version of chisquare() in mstats for masked
array inputs. However, hiding zeros is not correct unless both observed
and expected equal zero.

>>>> stats.chisquare(np.array([4, 1, 3, 1.]),np.array([4, 2, 2, 1.]))
> (1.0, 0.80125195690120077)
>
> Not accepting list is a bug

It is not a bug because the docstring says arrays not array-like.

> Returning nans in the case when you expect  zero in a bin might be by
> design. But we need to check this.
>
>>>> stats.chisquare(np.array([4, 1, 3, 1.]),np.array([4, 2, 0, 1.]))
> (inf, nan)

This is correct since the expected value for a cell is zero (results in
division by zero). You can not use the chi-square test in this
situation. You might be able to get the fisher exact test (see ticket
956 http://projects.scipy.org/scipy/ticket/956) to work here.

If you are doing something like density estimation then you probably
need to select your bins (especially in the tails) more carefully to
avoid this from happening.

Bruce

```