[SciPy-user] Fitting an arbitrary distribution

josef.pktd@gmai... josef.pktd@gmai...
Thu May 21 21:41:37 CDT 2009


On Thu, May 21, 2009 at 10:33 PM,  <josef.pktd@gmail.com> wrote:
> On Thu, May 21, 2009 at 9:58 PM, David Cournapeau
> <david@ar.media.kyoto-u.ac.jp> wrote:
>> David Baddeley wrote:
>>> Hi all,
>>>
>>> I want to fit an arbitrary distribution (in this case the sum of multiple Gaussians) to some measured data and was wondering if anyone could give me any pointers as to the best way of doing this. I'd like to avoid fitting to a histogram if possible. How do the .fit() methods of the various distributions under scipy.stats do it? My first thought would be to compare the cumulative distribution of my data with that of the model distibution using something like the kolmogorov-smirnov metric (maximum absolute distance between the curves) and to minimize this using optimize.fmin. Is this the right way to do it? Or is there an easier way?
>>
>> That's a complex topic in general, there is no best answer, it depends
>> on your case, and what you intend to do with the estimated distribution.
>>
>> In the case of a sum of mutiple Gaussians, the more commonly used name
>> for this model is mixture models, and there is a vast range of possible
>> techniques for fitting a dataset to this model. There is a package in
>> scikits.learn to use the so-called Expectation Maximization algorithm to
>> estimate the maximum likelihood of such models
>>
>> http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/
>>
>> You can have an overview on the wiki page:
>>
>> http://en.wikipedia.org/wiki/Mixture_model
>>
>
> Sum of random variables are convolutions, and are very different from
> mixtures of distributions. I just got confused in a discussion today
> when the other person talked about convolutions and I thought about
> mixtures and it didn't make a lot of sense.
>
> so, which is it?
>

Actually, Gaussians is in this context ambiguous, does it mean a
random variable or refer to the density/distribution function.
Sum of random variable is very different from a (weighted) sum of
distribution functions, which both are possible interpretation of "sum
of Gaussians"

Josef


More information about the SciPy-user mailing list