[SciPy-user] PyEM: custom (non-Euclidean) distance function?

josef.pktd@gmai... josef.pktd@gmai...
Mon Mar 16 11:28:30 CDT 2009

```On Mon, Mar 16, 2009 at 12:05 PM, Emanuele Olivetti
<emanuele@relativita.com> wrote:
> Emanuele Olivetti wrote:
>> Hi All,
>>
>> I'm playing with PyEM [0] in scikits and would like to feed
>> a dataset for which Euclidean distance is not supposed to
>> work. So I'm wondering how simple is to modify the code with
>> a custom distance (e.g., 1-norm).
>>
>>
>
> Additional info. My final goal is to run the EM algorithm
> and estimate the Gaussian mixture from data, but assuming
> a different distance function. I had a look to densities.py
> which seems to be the relevant file for this question. I
> can see the computation of Euclidean distance in:
> - _scalar_gauss_den()
> - _diag_gauss_den()
> - _full_gauss_den()
>
>
> So the question is: if I change those functions according to a
> new distance function, is it expected the EM estimation
> em.train() to work meaningfully? Are there other parts of PyEM
> that assumes Euclidean distance function?
>
>
> Emanuele

I don't know the answer, but I'm curious about your data and the
problem that you cannot calculate Euclidean distance.

The Gaussian mixture is based on the normal distribution for
continuous random variables and as such uses euclidean distance, or a
variant based on the covariance matrix to define the density function.
This seems to me a conflict between trying to fit the data to a
gaussian mixture if it doesn't allow gaussian distance calculations.
If the data is really different, then a gaussian mixture might not be
appropriate.

>From a quick look, gmm_em.py and gauss_mix.py are specialized to the
normal distribution and fully parametric, and I'm not sure what
distribution you get if you just change the distance function. And to
correctly allow for other distributions, would require more far
reaching changes than just changing the distance function, at least
that is my impression.

Josef
```