[SciPy-user] Fitting an arbitrary distribution
Dav Clark
dav@alum.mit....
Thu May 21 21:55:54 CDT 2009
On May 21, 2009, at 6:58 PM, David Cournapeau wrote:
> David Baddeley wrote:
>> Hi all,
>>
>> I want to fit an arbitrary distribution (in this case the sum of
>> multiple Gaussians) to some measured data and was wondering if
>> anyone could give me any pointers as to the best way of doing this.
>> I'd like to avoid fitting to a histogram if possible. How do
>> the .fit() methods of the various distributions under scipy.stats
>> do it? My first thought would be to compare the cumulative
>> distribution of my data with that of the model distibution using
>> something like the kolmogorov-smirnov metric (maximum absolute
>> distance between the curves) and to minimize this using
>> optimize.fmin. Is this the right way to do it? Or is there an
>> easier way?
>
> That's a complex topic in general, there is no best answer, it depends
> on your case, and what you intend to do with the estimated
> distribution.
>
> In the case of a sum of mutiple Gaussians, the more commonly used name
> for this model is mixture models, and there is a vast range of
> possible
> techniques for fitting a dataset to this model. There is a package in
> scikits.learn to use the so-called Expectation Maximization
> algorithm to
> estimate the maximum likelihood of such models
>
> http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/
There's actually a broken link on that page if you look for the em
page for the scikits project, which is now here:
http://scikits.appspot.com/
Depending on what exactly you want to do, you may also want to check
out PyMC for metropolis-hastings.
http://code.google.com/p/pymc/
My guess is that you're looking for the em package though.
Cheers,
Dav
More information about the SciPy-user
mailing list