[SciPy-User] Use of MPI in extension modules

Stefan Seefeld seefeld@sympatico...
Wed Nov 11 16:45:35 CST 2009

On 11/11/2009 01:23 PM, Brian Granger wrote:
> Stefan,
> This is probably a better topic for the IPython users list:
> http://mail.scipy.org/mailman/listinfo/ipython-user

Thanks !
I didn't know that actually exists. It doesn't appear to be listed on 
either http://www.scipy.org or http://ipython.scipy.org, nor on 
http://www.scipy.org/Mailing_Lists. I'll cross-post there, so we may 
continue the conversation there, assuming I'm not moderated.

> I'm working on a signal & image processing library that uses MPI
>     internally. I'd like to provide a Python interface to it, so I can
>     integrate it into SciPy. With 'normal' Python this all works nicely.
>     Just recently I have started to consider parallelism, i.e. I want
>     to use
>     the library's internal parallelism, by running it with ipython in
>     parallel.
>     My assumption was that all the engines started via 'ipcluster mpiexec
>     ..." would already have MPI_Init called, and thus, my extension
>     modules
>     would merely share the global MPI state with the Python interpreter.
>     That doesn't seem to be the case, as I either see all my module
>     instances report rank 0, or, if I don't call MPI_Init, get a
>     failure on
>     the first MPI call I do.
> You do need to tell the IPython engine how the should call MPI_Init.  
> The best way of doing this
> is to install mpi4py and then call ipcluster with the --mpi=mpi4py option.
> Once you do this, you can simply import your extension module and use 
> it - you won't have
> to call MPI_Init again.  The reason that IPython need to be told how 
> MPI_Init is called
> is that we try to make sure that the engine ids match the MPI ranks.

I'm not sure I understand. In fact, I had expected *only* ipython needed 
to know how to call MPI_Init. The rest of my own (extension) code then 
merely assumes it has been called with the appropriate arguments (which 
ultimately come from "mpirun", which itself is invoked by ipcluster, 
isn't it ?

Is that not true ?

Is there some documentation that explains the interaction between 
(i)python (the ipcluster.py module in particular), mpirun, and the 
ipengine script that the latter then invokes ? May be I can call 
MPI_Init() myself, if I know the arguments I need to pass along.

> But, one question.  Why not use mip4py for yor MPI calls?  If you 
> really need low-level C stuff
> mpi4py works very well with cython.  All that would be much more 
> pleasant than writing
> low level C/MPI code.  The key is that mpi4py handles all the 
> subtleties of the different MPI
> platforms, and OSs.  Doing that yourself is quite painful.

Well, happily this is already done. :-)

(I'm talking about http://www.codesourcery.com/vsiplplusplus)

In fact, we have embedded most of the MPI subtleties deeply in our 
library. Let me (very quickly) outline the idea of our approach:

The library provides a set of block types (for vectors, matrices, 
tensors), which may or may not be distributed. Most of the MPI calls 
need to be done on assignment, i.e. an equation "A = B" will result in 
communication if (and only if) A and B are distributed, and their 
distributions don't match. This programming paradigm is very similar to 
that used in pMatlab (http://www.ll.mit.edu/pMatlab)

So, all of this is already done. I'm now merely interested in adding 
Python bindings to it.



       ...ich hab' noch einen Koffer in Berlin...

More information about the SciPy-User mailing list