[IPython-dev] Suggestions for implementing parallel algorithms?
fullung at gmail.com
Thu Nov 9 16:50:42 CST 2006
I'm getting started with using IPython1 and mpi4py for implementing parallel
algorithms for speaker verification. I'm mostly dealing with data from the
NIST Speaker Recognition Evaluation, which entails thousands of speaker
models to train and test.
There are two expectation-maximization algorithms that are of interest to
me: k-means clustering and Gaussian Mixture Model training. Both of these
can be implemented in parallel by scattering the training data over a bunch
of nodes, calculating some statistics, combining the stats in some way and
repeating this process for a maximum number of iterations or until
convergence is attained.
I currently have these algorithms implemented on top of NumPy. For every
algorithm I have a pure-Python version. The Python class implements the
public interface and does argument checking and other housekeeping and the
defers to "private" methods to get the real work done. To get some decent
speed, some of these "private" methods also have C implementations. To get a
fast version of the algorithm, I mix a class containing only these "private"
methods that call through to C into my pure-Python class. As an example, in
the k-means case, this means I end up with two classes, KMeansEstimator
(pure Python) and CKMeansEstimator (Python on top with some C mixed in).
I would like to adapt these algorithms to run in parallel using IPython1.
Some details about my problem: for training my speaker models, I can simply
train a number of speaker models per node. This parallelises easily --
different nodes do different speakers. However, for training my world model,
I would like to involve all the nodes to work on the same model. This is
necessary because to train the world model, I have tens to hunderds of hours
of speech whereas the speaker models are adapted from the world model using
only a few seconds to a few minutes of speech.
A naive way to implement this in parallel would be to copy my existing
implementation and call RemoteController's scatterAll/executeAll/gatherAll
in places where the original algorithm does the loop over the data (the
I'm hoping I can avoid this duplication. My first idea is to make something
like a LocalController that implements IPython1's IController interface in a
way that makes sense for single node operation. This way, I can implement my
algorithm once in terms of IController operations, test it easily, and later
by simply setting a controller property on a instance of the class
implementing the algorithm, decide whether it runs on a single node or in
How do you guys handle these issues in your code? Any suggestions would be
P.S. For an example implementation of k-means on top of MPI, see
More information about the IPython-dev