[SciPy-User] An extra parameter to stats.chisquare ?

josef.pktd@gmai... josef.pktd@gmai...
Mon Aug 3 10:12:18 CDT 2009


On Sun, Aug 2, 2009 at 10:43 PM, Pierre GM<pgmdevlist@gmail.com> wrote:
>
> On Aug 2, 2009, at 5:18 PM, josef.pktd@gmail.com wrote:
>
>> On Sun, Aug 2, 2009 at 4:05 PM, Pierre GM<pgmdevlist@gmail.com> wrote:
>>> All,
>>> stats.chisquare requires a mandatory parameter (frequency of
>>> observations) and an optional argument (theoretical frequencies). In
>>> that second case, I think we have to introduce yet another parameter,
>>> p, corresponding to the number of parameters of the theoretical
>>> distribution: the number of degrees-of-freedom for the chisqprob
>>> would
>>> then be k-p (with k the sample size), instead of the current k-1. Of
>>> course, we can set p=1 by default.
>>> Comments ?
>>> P.
>>
>> No disagreement with adding e.g. "ddof" as additional keyword
>> parameter. This might be also relevant for other tests where the data
>> can be based on prior estimation. (The same problem shows up with
>> tests after regression.)
>>
>> For the chisquare test, I'm not sure about the theory, since I only
>> used chisquare without estimating parameters. Wikipedia seems a bit
>> ambiguous:
>
> Well, I guess we'd need a "real" statistician. From what I gathered,
> when you fit your N observations to a distribution with p parameters
> (eg, 2 for normal, 1 for logseries), the ddof is N-(p+1): http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
> However, that works as long as all the parameters are independent. If
> one depends from the others or can be related to the others, we switch
> from p independent parameters to (p-1), thus giving a dof of N-p. So,
> wikipedia looks right.


how about this change, then

def chisquare(f_obs, f_exp=None, ddof=0):
      ....
      return chisq, chisqprob(chisq, k-1-ddof)

default is when no parameters are estimated (dof=k-1), e.g. create
random sample and compare to distribution with *given* parameters.
I didn't find a reference for your statement that the parameter
(estimators) have to be independent. I only found more references to
efficient maximum likelihood estimation, in which case it is k-1-p. If
the parameters are estimated in a different way, then then the dof
should be between k-1-p and k-1, or it is possible that the asymptotic
distributions is not a chisquare, in which case the this test is not
appropriate.

We could add something like this in the notes.

>
>> If you need goodness-of-fit tests, then you could also try my
>> implementation of a more general class of gof statistics (power
>> discrepancy). I sent it to the mailing list a while ago.
>
>  From 04/27 ? I def'ny gonna give it a try, thanks a lot.

If you are testing discrete distribution, then there is also a helper function
in test_discrete_basic.py in stats/tests

def check_discrete_chisquare(distfn, arg, rvs, alpha, msg):
    '''perform chisquare test for random sample of a discrete distribution

The main point of the function is to do a equal weight binning, to
maintain a minimum expected frequency in each cell, which is
recommended (>=5 expected observations for the chisquare distribution
to be an appropriate approximation).
(Note: the function is not fully cleaned up, and not tested on its
own, but used for all discrete distributions in the stats.tests)

Josef


> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list