[SciPy-user] Numerical Recipes robust fit implementation

Anne Archibald peridot.faceted@gmail....
Mon Jul 16 18:32:06 CDT 2007


On 16/07/07, Angus McMorland <amcmorl@gmail.com> wrote:

> Kolmogorov-Smirnov is the way I had intended originally to go (there's
> a variant that approximates the probability for grouped (read:
> frequency) data). But, as Anne rightly points out I can apply it to
> the individual variates since I have them--- my earlier approach arose
> mainly from how I was looking at the data, and I need to get away from
> that.
>
> I still need an expected cdf based on my hypothesized distribution,
> parameters for which I want to estimate from my data: that's where the
> fitting comes in. My only remaining decision is whether a robust or
> least-squares fitting approach is more appropriate for deriving the
> expected distribution. The former is inherently self-fulfilling, in
> that it excludes from the estimation of the expected distribution the
> outliers that are likely the important deviations, and the latter will
> include all the data, but fit none of it very well. Time to play
> around and see how much difference there is, I think.
>
> Thanks for all your suggestions, they've been very useful.

I don't know why I didn't think of this before, I've been working with
them, but if you want to estimate a PDF (and therefore a CDF), kernel
density estimators are a very reasonable approach. Scipy implements
one, but if you want to include outliers you may find using a kernel
with bigger tails than a Gaussian useful. I don't know of a robust
kernel density estimator, but they've seen extensive work in the
statistical literature.

Anne


More information about the SciPy-user mailing list