[IPython-dev] Suggestions for implementing parallel algorithms?

Brian Granger ellisonbg.net at gmail.com
Thu Nov 9 23:34:49 CST 2006


Albert,

This sounds like a nice application of IPython1.  Fernando and I have
had a number of talks about this exact issue lately.  One things I
should say before all else.  In spite of there being years of research
in parallel computing and algorithms, there is basically no research
on *interactive* parallel computing.  I am not sure you need your
application to be interactive, but if you do, at some level, you are
in new territory.

With that said, there are some things to keep in mind.

First, it is important to realize that all the previous work done on
parallel algorithm/application development still applies even though
the result can be used interactively.  For example, if you need to
move data around between the ipython engines, you should still use MPI
- and all the guidelines for using MPI still apply.

The new and interesting question is really "how and where do you want
to interact with the parallel application as a human user?"  There are
many models of how you would want your application abstracted for
interactive usage.  And at some level, you may want to have the
interactive API very different from the underlying computational model
and parallel algorithm.  You may want to hide the parallelism or you
may find it better to show it explicitely in the API.

In my own work, I have tended to factor my application into units that
can perform the basic computational tasks for both a serial and
parallel versions of the code.  I then use these as building blocks to
build the parallel and serial version.  If the low level components
are factored well, the high level algorithm is typically very short
and I don't mind maintaining both a serial and a parallel version.

For many things I do like the scatterAll/executeAll/gatherAll style of
computation - it is extremely lightweight and easy to implement.  The
one thing to be careful of though is to not use this approach when MPI
is more appropriate.  Testing the scaling of your application with
quickly reveal if there are problems like this.

> I'm hoping I can avoid this duplication. My first idea is to make something
> like a LocalController that implements IPython1's IController interface in a
> way that makes sense for single node operation. This way, I can implement my
> algorithm once in terms of IController operations, test it easily, and later
> by simply setting a controller property on a instance of the class
> implementing the algorithm, decide whether it runs on a single node or in
> parallel.

I had not thought of that before, but it does make sense.  It is sort
of similar to building objects that hide whether the object is being
used in a parallel/serial context.  It is surely worth trying this
approach, but I am not sure how it would turn out in your case.

I don't know if this helps, but I would love to see what you end up
trying and what you find most useful - I am curious about all these
things myself.

Brian

> How do you guys handle these issues in your code? Any suggestions would be
> appreciated.
>
> Cheers,
>
> Albert
>
> P.S. For an example implementation of k-means on top of MPI, see
> http://www.cs.umn.edu/~mnjoshi/PKMeans.pdf
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://projects.scipy.org/mailman/listinfo/ipython-dev
>


More information about the IPython-dev mailing list