[SciPy-user] PyEM: custom (non-Euclidean) distance function?

josef.pktd@gmai... josef.pktd@gmai...
Mon Mar 16 13:31:03 CDT 2009


On Mon, Mar 16, 2009 at 2:05 PM,  <josef.pktd@gmail.com> wrote:
>>
>>> I don't know about EM applications, but from a maximum likelihood view
>>> point, it might be possible to find the distribution class for the
>>> mixture that corresponds to different kinds of distance measures or
>>> that is appropriate for discrete data.
>>
>> EM (for MLE) is applicable to many models within the exponential
>> hidden family (that is when the complete data follow a density in the
>> exponential family). So it is definitely much more general than GMM,
>> and can be applied to discrete data (for example mixture of
>> multinomials). In my own field, speech processing, the EM algorithm is
>> applied to both continuous data (GMM and HMM with GMM emission
>> densities for acoustic modelling) and discrete date (for language
>> modelling).
>>
>> I am still not sure to understand how distance may come in that context, though.
>>
>
> I only have a vague intuition, since I never worked much with
> non-probabilistic models in this area. But I'm thinking of the
> similarity of an iso-distance contour with an iso-likelihood contour
> and that classifying a point as belonging to one of the mixture
> distributions depends on the likelihood or posterior ratio.
>
> A simple case, if one of the variables is in logs, then the joint
> distribution would be normal-lognormal and the corresponding distance
> measure would need log-scale on one axis.

small correction to sloppy phrasing:
"If X is a random variable with a normal distribution, then Y = exp(X)
has a log-normal distribution; likewise, if Y is log-normally
distributed, then log(Y) is normally distributed." (Wikipedia)

> The likelihood ratio or
> posterior probability, to which mixture distribution an observed point
> belongs, would use different scales and not a Euclidean distance in
> the random variables unless they are correctly transformed. A similar
> intuition should work for other non-linear transformations. But I
> don't know if the commonly used distance measures would make sense in
> this interpretation.
>
> Josef
>


More information about the SciPy-user mailing list