[SciPy-User] scipy.stats.fit inquiry

josef.pktd@gmai... josef.pktd@gmai...
Tue Oct 20 05:52:15 CDT 2009


On Mon, Oct 19, 2009 at 11:53 PM, Anne Archibald
<peridot.faceted@gmail.com> wrote:
> 2009/10/19 Leon Adams <leon_r_adams@hotmail.com>:
>
>> I am using scipy.stats module to perform some distribution fitting. What I
>> cannot seem to get a handle on is how to compare the quality of fit
>> achieved. At this stage the docs does not seem to be quite as useful... As
>> an example, I fit my data using
>>
>>
>> fitExp = st.expon.fit(data)
>>
>> which returns an array [ 0.99999999  1.33310547]
>>
>> How do we access the resulting maximized likelihood, mean square errors ...
>> Also, how would we go about calculating KS tests for the fitted parameters
>> ?? Mainly I am interesting in how good is this fit, and what diagnostics we
>> have available.
>
> I'm not sure what tools we have in scipy, but there's always the
> everything-looks-like-a-nail approach: fit for the parameters, then
> use the fitted distribution to generate many data sets and see how
> many of them are a better fit than yours.
>
> We do have a K-S test, which would serve as a reasonable way to answer
> "how well does this data fit this distribution". The p value you get
> will be wrong if you obtained the distribution by fitting, but the K-S
> value will still be a reasonable measure of quality-of-fit (which you
> can compare to the quality of models fit to generated data sets). The
> scatter in model parameters obtained by fitting generated data sets
> will give you an estimate of the uncertainties on the fitted
> parameters.
>
> For smarter approaches, for example Cash statistics, I'm not sure
> whether scipy has anything more spohisticated, but at least scipy's
> distributions will give you PDFs you can take negative logs of.
>
> Anne
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

For other fit tests, scipy.stats also has chisquare, I have written a
power_discrepancy test, which both require binning.

scipy.stats also has a probability plot, statsmodels a qqplot for
visual inspections, scipy.stats has anderson darling (``anderson``)
for exponential, which I never verified whether the results are correct

There are a few examples for KS test in the scipy.stats tutorial and
I'm using it heavily in the scipy.stats.distributions tests.

I never heard about the Cash statistic.

I also have an example
http://code.google.com/p/joepython/source/browse/trunk/joepython/scipystats/enhance/try_VaR.py

Where I tried to fit a dataset to all distributions that are available in
scipy.stats.distributions. The main remaining problem is that, in most
cases, we wouldn't want to estimate the loc, if the distribution has a
finite boundary in the support.

Josef


More information about the SciPy-User mailing list