[IPython-User] Parallel ipython over ssh+NFS

Jose Gomez-Dans jgomezdans@gmail....
Tue Jun 12 12:10:34 CDT 2012


Hi,

So I have ipython 0.11 in our system, and I would like to do some
trivial parallel processing, making use of ssh and the fact that we
have NFS in our home directories (i.e. ~/.config/ipython is visible in
all hosts in the same place).

So, I followed the instructions here
<http://mail.scipy.org/pipermail/ipython-user/2011-December/008833.html>
1.- Create a new test profile using
       $ ipython profile create sshtest --parallel
2.- Edit ipcluster_config.py to look like this

c = get_config()

# delay 60 seconds between starting the controller, and starting the engines
c.IPClusterStart.delay = 60

# start engines with SSH
c.IPClusterEngines.engine_launcher = \
        'IPython.parallel.apps.launcher.SSHEngineSetLauncher'

# You only need to use the SSHController launcher if you are *not*
# running ipcluster from the controller machine
c.IPClusterStart.controller_launcher = \
    'IPython.parallel.apps.launcher.SSHControllerLauncher'
c.SSHControllerLauncher.hostname = 'my_hostname'

# this and above should result in
#    $> ssh you@conlin.lanl.gov "ipcontroller --ip=128.165...
#
# add `--reuse` to this only *after* everything appears to be working,
#because it can prevent
# certain changes from having an effect
c.SSHControllerLauncher.program_args = [ '--ip=xx.xx.xx.xx',
    '--log-to-file', '--log-level=10' ]

# if you are not using SSH to launch the controller:
c.LocalControllerLauncher.controller_args = [ '--ip=xx.xx.xx.xx',
    '--log-to-file', '--log-level=10']

c.SSHEngineSetLauncher.engines = {
    'sun-node01': 2,
    'sun-node02': 2, # etc.
    }

3.- Test running the controller on my_hostname
   $ ipcontroller --profile=sshtest
4.- Test connecting an engine from one node (say sun-node02 above):
   $ ipengine --profile=sshtest
5.- Steps (3) and (4) are succesful (there's a connection, and I can
pass things to the engine, see it's IP address etc)

Now, I would like to just use ipcluster to launch the controller and
engines. This doesn't work as expected: if I launch
   $ ipcluster --profile=sshtest

then after ~60s, all I get is ipcluster launching 12 engines on the
controller (the controller has 12 cores, so this might be a default).
However, they are all local. I can ssh into the nodes and launch the
engines using ipcluster engine profile=sshtest. However, this launches
4 engines per node (and not 2, these are 4 core machines).

I really don't understand what's going on here, but my problem appears
similar to the one Jeremy reported. However, I can only use two cores
in the nodes, so ipcluster engine etc doesn't work for me.

Additionally, how would one go about giving the engines a particular
"nice" value? If I don't sort this stuff out, I think I might become
very unpopular among my colleagues! ;-)

Thanks!
Jose


More information about the IPython-User mailing list