[SciPy-user] PyEM: custom (non-Euclidean) distance function?

josef.pktd@gmai... josef.pktd@gmai...
Mon Mar 16 13:05:21 CDT 2009


>
>> I don't know about EM applications, but from a maximum likelihood view
>> point, it might be possible to find the distribution class for the
>> mixture that corresponds to different kinds of distance measures or
>> that is appropriate for discrete data.
>
> EM (for MLE) is applicable to many models within the exponential
> hidden family (that is when the complete data follow a density in the
> exponential family). So it is definitely much more general than GMM,
> and can be applied to discrete data (for example mixture of
> multinomials). In my own field, speech processing, the EM algorithm is
> applied to both continuous data (GMM and HMM with GMM emission
> densities for acoustic modelling) and discrete date (for language
> modelling).
>
> I am still not sure to understand how distance may come in that context, though.
>

I only have a vague intuition, since I never worked much with
non-probabilistic models in this area. But I'm thinking of the
similarity of an iso-distance contour with an iso-likelihood contour
and that classifying a point as belonging to one of the mixture
distributions depends on the likelihood or posterior ratio.

A simple case, if one of the variables is in logs, then the joint
distribution would be normal-lognormal and the corresponding distance
measure would need log-scale on one axis. The likelihood ratio or
posterior probability, to which mixture distribution an observed point
belongs, would use different scales and not a Euclidean distance in
the random variables unless they are correctly transformed. A similar
intuition should work for other non-linear transformations. But I
don't know if the commonly used distance measures would make sense in
this interpretation.

Josef


More information about the SciPy-user mailing list