[IPython-dev] Parallel map

Gael Varoquaux gael.varoquaux@normalesup....
Sat Mar 8 04:03:29 CST 2008


On Fri, Mar 07, 2008 at 08:42:26PM +0100, Gael Varoquaux wrote:
> On Fri, Mar 07, 2008 at 08:10:36PM +0100, Gael Varoquaux wrote:
> > I am trying to do a parallel map using ipython1. Is there a really simple
> > way to do this, or a tutorial somewhere telling me how? I can probably
> > figure it out, but I have to dig through a fair amount of
> > tutorial/doc/wiki articles/reading source code to move forward.

> > My requirement is that I want the code to be purely valid self sustained
> > Python code.

> OK, making some progress at this.

> I found out I need to create a MultiEngineClient

> rc = client.MultiEngineClient(('127.0.0.1', 10105))

> and I can use its map method.

I succeeded (I had a good night(s sleep, in between), by piggy backing
the ipcluster script. It is a bit ugly, but I post the code here for
future reference.

What made my task hard was both the fact that there is no obvious way of
creating a cluster from Python, and the fact that ipython1.kernel.api was
suppressed and that all the information I can find on the web uses
ipython1.kerenl.api.RemoteControler.

Now the irony is that I ended up not beeing able to use ipython1 for the
problem I was interested in, as the objects I wanted to send to my
parallel map where not picklable. I wrote a small hack using threading
and os.system to do the work. I suspect this is a limitation people are
going to bump into quite often. Ideas to make a workaround more or less
part of ipython1 natively would be great. In my case, the object I had to
scatter where directly imported from a module, so scattering a module
path as a string (eg 'ipython1.kernel.client.MultiEngineClient') waould 
have been an option. I have no hindsight on these problems, so I don't
pretend suggesting a good solution.

Anyway, thanks for ipython1, keep the good work up, it is a difficult but
import task,

Cheers,

Gaël

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
"""
Provides a simple parallel map.
"""

# Piggy-back the ipcluster script to start the engines.
from ipython1.kernel.scripts import ipcluster as cluster
from ipython1.kernel.client import MultiEngineClient
from threading import Thread
from time import sleep
import sys

##############################################################################
def guess_ncpu():
    """ Parses /proc/cpuinfo to guess the number of CPU on the box.
        This has been tested only under Linux.
    """
    ncpu = 0
    cpuinfo = file('/proc/cpuinfo')
    for line in cpuinfo.readlines():
        if line[:10]  == 'processor\t':
            ncpu += 1
    return ncpu

##############################################################################
# Code to start the engine and create the controller
def start_cluster(ncpu=guess_ncpu()):
    """ Starts a cluster on the local computer and returns a controller
        to the cluster.
    """
    # We use ipcluster.main, but it takes its instructions from sys.argv,
    # thus we overide it
    orig_argv = sys.argv
    sys.argv = ['foo', '-n', str(ncpu)]
    # Starting the cluster is a blocking operation. We thus need a
    # thread to do the work.
    Thread(target=cluster.main).start()
    # There is a sleep(3) in ipcluster
    sleep(4)
    sys.argv = orig_argv 
    return MultiEngineClient(('127.0.0.1',10105))

##############################################################################
# This code is so trivial you should really use directly the controller
# method if you are going to do anything more than running pmap once
# (keep in mind that there is an overhead of creating the cluster).
def pmap(func, seq, ncpu=guess_ncpu()):
    """ Creates a cluster of ipython1 engines and runs a parallel map on
        it.
    """
    mec = start_cluster(ncpu=ncpu)
    outseq = mec.map(func, seq)
    mec.kill(controler=True)
    return outseq
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the IPython-dev mailing list