[SciPy-user] Fitting an arbitrary distribution

Dav Clark dav@alum.mit....
Thu May 21 21:55:54 CDT 2009


On May 21, 2009, at 6:58 PM, David Cournapeau wrote:

> David Baddeley wrote:
>> Hi all,
>>
>> I want to fit an arbitrary distribution (in this case the sum of  
>> multiple Gaussians) to some measured data and was wondering if  
>> anyone could give me any pointers as to the best way of doing this.  
>> I'd like to avoid fitting to a histogram if possible. How do  
>> the .fit() methods of the various distributions under scipy.stats  
>> do it? My first thought would be to compare the cumulative  
>> distribution of my data with that of the model distibution using  
>> something like the kolmogorov-smirnov metric (maximum absolute  
>> distance between the curves) and to minimize this using  
>> optimize.fmin. Is this the right way to do it? Or is there an  
>> easier way?
>
> That's a complex topic in general, there is no best answer, it depends
> on your case, and what you intend to do with the estimated  
> distribution.
>
> In the case of a sum of mutiple Gaussians, the more commonly used name
> for this model is mixture models, and there is a vast range of  
> possible
> techniques for fitting a dataset to this model. There is a package in
> scikits.learn to use the so-called Expectation Maximization  
> algorithm to
> estimate the maximum likelihood of such models
>
> http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/

There's actually a broken link on that page if you look for the em  
page for the scikits project, which is now here:

http://scikits.appspot.com/

Depending on what exactly you want to do, you may also want to check  
out PyMC for metropolis-hastings.

http://code.google.com/p/pymc/

My guess is that you're looking for the em package though.

Cheers,
Dav


More information about the SciPy-user mailing list