[SciPy-user] Numerical Recipes robust fit implementation

Anne Archibald peridot.faceted@gmail....
Mon Jul 16 06:54:28 CDT 2007


On 16/07/07, Angus McMorland <amcmorl@gmail.com> wrote:

> I'm going to have to think a bit more what I want to achieve to see if
> RANSAC is useful. Ultimately I hope to determine the probability of a
> a given data set being exponentially distributed, by comparing the raw
> frequency distribution to an expected distribution based on a linear
> fit to the log transform of the raw one. It seems a bit like basing my
> 'expected' distribution on a subset of data from which outliers have
> been completely excluded is self-fulfilling, and having some other
> criterion for weighting of the error term (as medfit does) seems more
> appropriate. This is however very much just a gut feeling rather than
> an educated assessment, so any other comments are welcome.

If what you're trying to do is test whether your data points are
likely to have been drawn from a given distribution, you may be able
to do much better by not putting them in a histogram first, and using
something like the Kolmogorov-Smirnov test (scipy.stats.kstest). If
you have outliers you may have a problem. (It's feasible to fit
parameters so they maximize the kstest p value, although of course the
p value you get at the end is no longer an actual probability.) I
suspect if you look in the statistical literature you'll find other
tricks for fitting distributions directly (rather than fitting
histograms).

Anne


More information about the SciPy-user mailing list