[IPython-dev] Suggestions for implementing parallel algorithms?

Albert Strasheim fullung at gmail.com
Thu Nov 9 16:50:42 CST 2006

Hello all

I'm getting started with using IPython1 and mpi4py for implementing parallel 
algorithms for speaker verification. I'm mostly dealing with data from the 
NIST Speaker Recognition Evaluation, which entails thousands of speaker 
models to train and test.

There are two expectation-maximization algorithms that are of interest to 
me: k-means clustering and Gaussian Mixture Model training. Both of these 
can be implemented in parallel by scattering the training data over a bunch 
of nodes, calculating some statistics, combining the stats in some way and 
repeating this process for a maximum number of iterations or until 
convergence is attained.

I currently have these algorithms implemented on top of NumPy. For every 
algorithm I have a pure-Python version. The Python class implements the 
public interface and does argument checking and other housekeeping and the 
defers to "private" methods to get the real work done. To get some decent 
speed, some of these "private" methods also have C implementations. To get a 
fast version of the algorithm, I mix a class containing only these "private" 
methods that call through to C into my pure-Python class. As an example, in 
the k-means case, this means I end up with two classes, KMeansEstimator 
(pure Python) and CKMeansEstimator (Python on top with some C mixed in).

I would like to adapt these algorithms to run in parallel using IPython1.

Some details about my problem: for training my speaker models, I can simply 
train a number of speaker models per node. This parallelises easily --  
different nodes do different speakers. However, for training my world model, 
I would like to involve all the nodes to work on the same model. This is 
necessary because to train the world model, I have tens to hunderds of hours 
of speech whereas the speaker models are adapted from the world model using 
only a few seconds to a few minutes of speech.

A naive way to implement this in parallel would be to copy my existing 
implementation and call RemoteController's scatterAll/executeAll/gatherAll 
in places where the original algorithm does the loop over the data (the 
expectation step).

I'm hoping I can avoid this duplication. My first idea is to make something 
like a LocalController that implements IPython1's IController interface in a 
way that makes sense for single node operation. This way, I can implement my 
algorithm once in terms of IController operations, test it easily, and later 
by simply setting a controller property on a instance of the class 
implementing the algorithm, decide whether it runs on a single node or in 

How do you guys handle these issues in your code? Any suggestions would be 



P.S. For an example implementation of k-means on top of MPI, see 

More information about the IPython-dev mailing list