[IPython-User] Load balancing IPython.parallel using MPI?
Wed Aug 29 17:04:51 CDT 2012
On Wed, Aug 29, 2012 at 7:07 AM, Jon Olav Vik <email@example.com> wrote:
> I am using IPython.parallel for load-balanced, fault-tolerant computation
> on a
> shared cluster running a PBS batch system. However, when I scale into
> of engines it seems that the number of TCP connections becomes a limiting
> factor, to the extent of slowing down the job submission system and
> other users. System admins tell me the cluster is really intended for
> jobs. I am not transferring large messages, only workpiece indexes, but it
> seems the sheer number of messages is the problem.
How many engines starts to cause trouble? I don't have access to hardware
to test this sort of things,
and would love data points. I have found that it behaves fairly well at
least up to 128 engines,
which is the most I have run.
> Can IPython.parallel use MPI instead of TCP for its communication between
> ipcontroller and ipengine?
No, it cannot. The answer to this will be to shard schedulers across
multiple machines, which is not yet implemented. The Client and Engine
already would have no problem, it's only starting the cloned Schedulers
that needs to be added.
The best you can do right now is to limit your IPython jobs, such that you
have one IPython cluster per N engines, where N is a reasonable number that
you find performs adequately. Then you would, in your client code, have to
distribute work chunks across your P/N clients.
> Previously, I ran into a "too many open files" limitation on another
> On this one, the ulimit is higher, so I'm not hitting a hard limit, but
> I seem to be competing with the batch system's monitoring of nodes.
> Googling for IPython and MPI, what I've found mostly seems to be about
> *explicitly* using MPI for message passing, e.g.:
> What I would like instead is to take my existing program and just switch
> transport of ZMQ messages over to MPI, just by changing the configuration
> IPython. Is that possible?
> An example program is given below -- ideally, I would like to get away
> with at
> most one mention of "MPI" in the code 8-)
> Thanks in advance,
> Jon Olav
> """Distributed, load-balanced computation with IPython.parallel."""
> from IPython.parallel import Client
> c = Client()
> lv = c.load_balanced_view()
> @lv.parallel(ordered=False, retries=10)
> def busywait(workpiece_id, seconds):
> """Task to be run on engines. Return i and process id."""
> # Imports that will be executed on the client.
> import os
> import time
> t0 = time.time()
> while (time.time() - t0) < seconds:
> return workpiece_id, seconds, os.getpid()
> workpiece_id = range(15)
> seconds = [2 + i % 5 for i in workpiece_id] # percent sign means 'modulus'
> async = busywait.map(workpiece_id, seconds)
> IPython-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User