[IPython-User] Load balancing IPython.parallel using MPI?

Jon Olav Vik jonovik@gmail....
Wed Aug 29 09:07:15 CDT 2012


I am using IPython.parallel for load-balanced, fault-tolerant computation on a 
shared cluster running a PBS batch system. However, when I scale into hundreds 
of engines it seems that the number of TCP connections becomes a limiting 
factor, to the extent of slowing down the job submission system and bothering 
other users. System admins tell me the cluster is really intended for MPI-based 
jobs. I am not transferring large messages, only workpiece indexes, but it 
seems the sheer number of messages is the problem.

Can IPython.parallel use MPI instead of TCP for its communication between 
ipcontroller and ipengine?

Previously, I ran into a "too many open files" limitation on another cluster: 
http://thread.gmane.org/gmane.comp.python.ipython.user/8503/focus=8514
On this one, the ulimit is higher, so I'm not hitting a hard limit, but instead 
I seem to be competing with the batch system's monitoring of nodes.


Googling for IPython and MPI, what I've found mostly seems to be about 
*explicitly* using MPI for message passing, e.g.:

http://ipython.org/ipython-doc/dev/parallel/parallel_mpi.html

What I would like instead is to take my existing program and just switch the 
transport of ZMQ messages over to MPI, just by changing the configuration of 
IPython. Is that possible?

An example program is given below -- ideally, I would like to get away with at 
most one mention of "MPI" in the code 8-)

Thanks in advance,
Jon Olav


"""Distributed, load-balanced computation with IPython.parallel."""

from IPython.parallel import Client

c = Client()
lv = c.load_balanced_view()

@lv.parallel(ordered=False, retries=10)
def busywait(workpiece_id, seconds):
    """Task to be run on engines. Return i and process id."""
    # Imports that will be executed on the client.
    import os
    import time
    
    t0 = time.time()
    while (time.time() - t0) < seconds:
        pass
    return workpiece_id, seconds, os.getpid()

workpiece_id = range(15) 
seconds = [2 + i % 5 for i in workpiece_id]  # percent sign means 'modulus'
async = busywait.map(workpiece_id, seconds)
async.wait_interactive()




More information about the IPython-User mailing list