[SciPy-dev] PyEM, toolbox for Expectation Maximization for Gaussian Mixtures (proposal for inclusion into scipy.sandbox)
David Cournapeau
david at ar.media.kyoto-u.ac.jp
Fri Oct 6 09:36:05 CDT 2006
Hi,
A few months ago, I posted a preliminary version of PyEM, a numpy
package for Expectation Maximization for Gaussian Mixture Models. As it
was developed during the various change of core functions in numpy
(change of axis convention in mean, sum, etc...), I stopped doing public
releases.
Now that this numpy API has settled, I propose the package for
inclusion into the scipy sandbox. The package has already been used with
success by several other people; I tried to have a coherent and easy to
use API, and I have included some pretty plotting functions :). By
including it to scipy, I hope also to get some feedback on usage,
possible improvements, more testing, etc...
* DOWNLOAD:
The scipy version is available here:
http://www.ar.media.kyoto-u.ac.jp/members/david/pyem/pyem-scipy-0.5.3.tar.gz
* INSTALLATION INSTRUCTIONS:
I don't know the best way to package a package so that it is
included in scipy: to make it work, you just need to uncompress the
archive, and move the directory to Lib/sandbox/pyem. An example script
is included in the archive, example.py. Some preliminary tests are
included (including the not-enabled by default ctype version, which
requires a recent version of ctype).
* EXAMPLE USAGE:
import numpy as N
from scipy.sandbox.pyem import GM, GMM, EM
import copy
#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# Create an artificial 2 dimension, 3 component GMM model, sample it
#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
d = 2
k = 3
w, mu, va = GM.gen_param(d, k, 'diag', spread = 1.5)
# GM.fromvalues is a class function
gm = GM.fromvalues(w, mu, va)
# Sample nframes frames from the model
data = gm.sample(nframes)
#++++++++++++++++++++++++
# Learn the model with EM
#++++++++++++++++++++++++
# Init the model: here we create a mixture from its meta-parameters only
# (dimension, number of components) using the GM "ctor"
lgm = GM(d, k, mode)
# Create a model to be trained from a mixture, with kmean for initialization
gmm = GMM(lgm, 'kmean')
gmm.init(data)
# The actual EM, with likelihood computation. The threshold
# is compared to the (linearly approximated) derivative of the likelihood
em = EM()
like = em.train(data, gmm, maxiter = 30, thresh = 1e-8)
# "Trained" parameters are available through gmm.gm.w, gmm.gm.mu, gmm.gm.va
* PLOTTING EXAMPLES:
http://www.ar.media.kyoto-u.ac.jp/members/david/pyem/example_1_dimension_mode_diag.png
http://www.ar.media.kyoto-u.ac.jp/members/david/pyem/example_2_dimension_mode_diag.png
* FUTURE:
I use the package myself quite regularly, and intend to improve it in
the near future:
- a script online_em.py for online EM for reinforcement learning is
included, but not available by default as this is beta, the API awkward,
and not likely to work really well for now.
- inclusion of priors to avoid covariance shrinking toward 0.
- I started to code some core functions in C with ctypes (this can be
enabled if you uncomment #import _c_densities as densities
in the file gmm_em.py, and comment the line import densities).
- Ideally, I was hoping to start a project of numpy packages for
Machine Learning (Kalman filtering, HMM, etc...); I don't know if other
people would be interested in developing such a package.
Cheers,
David
More information about the Scipy-dev
mailing list