[SciPy-user] Fitting an arbitrary distribution

josef.pktd@gmai... josef.pktd@gmai...
Thu May 21 21:27:20 CDT 2009


On Thu, May 21, 2009 at 9:47 PM, David Baddeley
<david_baddeley@yahoo.com.au> wrote:
>
> Hi all,
>
> I want to fit an arbitrary distribution (in this case the sum of multiple Gaussians) to some measured data and was wondering if anyone could give me any pointers as to the best way of doing this. I'd like to avoid fitting to a histogram if possible. How do the .fit() methods of the various distributions under scipy.stats do it? My first thought would be to compare the cumulative distribution of my data with that of the model distibution using something like the kolmogorov-smirnov metric (maximum absolute distance between the curves) and to minimize this using optimize.fmin. Is this the right way to do it? Or is there an easier way?
>

I have an example script that tries to fit a dataset to all
distributions in scipy.stats

http://code.google.com/p/joepython/source/browse/trunk/joepython/scipystats/enhance/try_VaR.py

I use ksstat as distance metric.

If you have data with full support on the real line and look only at
those distributions, then the current fit method works pretty well.
Problems exist for distribution with a finite support boundary point.
And stats.distributions only has univariate distributions, there is no
support for multivariate distributions.
I have also written several extension distributions (also univariate
only), that are however not yet in scipy.

What exactly do you mean with "sum of multiple Gaussians"? If i take
it literally as sum of several normal distributed random variables,
then the distribution would be just normal again.

If you provide some more information on the structure of your data, I
would be better able to see if scipy.stats can handle them.

Josef


More information about the SciPy-user mailing list