[SciPy-user] PyEM: custom (non-Euclidean) distance function?

josef.pktd@gmai... josef.pktd@gmai...
Mon Mar 16 12:10:28 CDT 2009


A comment on gmm_em.py:

in
       def _update_em_full(self, data, gamma, ngamma):

there is a triple loop, the inner two loops are:

            # This should be much faster than recursing on n...
            for i in range(d):
                for j in range(d):
                    xx[i, j] = N.sum(data[:, i] * data[:, j] * gamma.T[c, :],
                            axis = 0)

in my reading data[:, i], data[:, j], and gamma.T[c, :] are all 1 dimensional.
If this is correct, then to me this looks like

xx = N.dot(data.T, data * gamma[:,c:c+1])

I'm not completely sure about the shape of gamma, why you transposed it.

According to a numpy ticket using dot should be much faster than sum.
This is just reading the code, not actually tested.


On the distribution:
I don't know about EM applications, but from a maximum likelihood view
point, it might be possible to find the distribution class for the
mixture that corresponds to different kinds of distance measures or
that is appropriate for discrete data.

Josef


More information about the SciPy-user mailing list