[IPython-User] Parallel ipython over ssh+NFS

MinRK benjaminrk@gmail....
Tue Jun 12 14:43:22 CDT 2012

On Tue, Jun 12, 2012 at 12:00 PM, Jose Gomez-Dans <jgomezdans@gmail.com>wrote:

> Hi
> Thanks for your reply. There are improvements, but not quite there yet...
> On 12 June 2012 19:31, MinRK <benjaminrk@gmail.com> wrote:
> > I believe the above config should be controller_launcher_class and
> > engine_launcher_class.
> > I imagine your issue stems from the config typos above.  This is behaving
> > exactly as expected if you had simply not specified the engine/controller
> > launcher classes (which you haven't done, since they are given with the
> > wrong name).
> >
> > When in doubt, always add `--debug`.  I expect you will see "starting 12
> > engines with LocalEngineSetLauncher (this is actually displayed at the
> > default log-level, at least on master).
> I don't think ipcluster has a --debug option in 0.11 (it complains it
> doesn't understand it).

Weird.  I would also *strongly* recommend updating to current IPython
(0.12.1), or using master, which should be released 0.13 within a month.
These both have some improvements to the launchers.

> Now, after solving those typos, and running
> ipcluster start --profile=ssh test, this comes up on the screen:
> [IPClusterStart] Using existing profile dir:
> u'/home/ucfajlg/.config/ipython/profile_sshtest'
> [IPClusterStart] Starting ipcluster with [daemon=False]
> [IPClusterStart] Creating pid file:
> /home/ucfajlg/.config/ipython/profile_sshtest/pid/ipcluster.pid
> [IPClusterStart] Process 'ssh' started: 16737
> [IPClusterStart] tcgetattr: Invalid argument
> [IPClusterStart] [IPControllerApp] Config changed:
> [IPClusterStart] [IPControllerApp] {'Application': {'log_level': 10},
> 'HubFactory': {'ip': u'xx.xx.xx.xx'}, 'BaseParallelApplication':
> {'log_to_file': True}}
> [IPClusterStart] [IPControllerApp] Using existing profile dir:
> u'/home/ucfajlg/.config/ipython/profile_default'
> [IPClusterStart] [IPControllerApp] Attempting to load config file:
> ipython_config.py
> [IPClusterStart] [IPControllerApp] Config changed:
> [IPClusterStart] [IPControllerApp] {'Application': {'log_level': 10},
> 'HubFactory': {'ip': u'xx.xx.xx.xx'}, 'TerminalIPythonApp':
> {'extensions': ['kernmagic']}, 'ProfileDir': {},
> 'BaseParallelApplication': {'log_to_file': True}}
> [IPClusterStart] [IPControllerApp] Attempting to load config file:
> ipcontroller_config.py
> [IPClusterStart] Scheduler started [leastload]
> [IPClusterStart] Starting 12 engines
> [IPClusterStart] Process 'ssh' started: 16780
> [IPClusterStart] Starting SSHEngineSetLauncher: ['ssh', '-tt',
> u'sun-node08, '/opt/epd-7.1-2-rh5-x86_64/bin/python',
> u'/opt/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/apps/ipengineapp.py',
> '--log-to-file', '--log-level=20']
> [IPClusterStart] Process 'ssh' started: 16781
> [IPClusterStart] Process 'ssh' started: 16782
> [...]
> [IPClusterStart] Process 'ssh' started: 16807
> [IPClusterStart] Process 'engine set' started: [None, None, None,
> None, None, None, None, None, None, None, None, None, None, None,
> None, None, None, None, None, None, None, None, None, None, None,
> None, None, None]
> [IPClusterStart] tcgetattr: Invalid argument
> [IPClusterStart] tcgetattr: Invalid argument
> [...]
> [IPClusterStart] [IPEngineApp] Using existing profile dir:
> u'/home/xxx/.config/ipython/profile_default'
> [IPClusterStart] [IPEngineApp] Using existing profile dir:
> u'/home/xxx/.config/ipython/profile_default'
> So, it appears to launch ssh processes galore, which is good. But
> what's this about tcgetattr?

It is a useless warning, I think maybe the '-tt' arg isn't what it should

> Anyway, trying to connect works with
> rc = Client ()
# OK, I can see the remote engines
> but fails if I specify the profile
> rc = Client ( profile="sshtest")
> TimeoutError: Hub connection request timed out
> I can get it to work ignoring the profile option, but I wonder whether
> I should be wary of this?

Looking at your log output, with your config for the controller args, you
have specified that IPCluster with profile sshtest should start the
Controller and Engines under the default profile.  This is why connecting a
Client to the default profile works, because you haven't started a
Controller with the sshtest profile.  Adding '--profile=ssh' to your
engine/controller args should address this.  I also think this would not
come up if you were using 0.12 or master, as the profile_dir arg should
~always be specified there.

> >> Additionally, how would one go about giving the engines a particular
> >> "nice" value? If I don't sort this stuff out, I think I might become
> >> very unpopular among my colleagues! ;-)
> >
> >
> > nice is not something ipcluster exposes, you will have to either create
> your
> > own launchers or use ipengine directly.
> In previous versions, there used to be sshx.sh where you could do this
> (also important to set up paths and stuff like that). Is this
> documented somewhere?

The SSH launchers after the 0.11 reorganization are a significant
regression from sshx in 0.10.2.  They have inched forward in the releases
since then, but I would say they are still pretty poor.

You are welcome to submit Issues (or better yet, pull requests) to improve
them, but I wouldn't expect them to improve quickly, as they are a fairly
low priority right now.

I should also note that just because ipcluster starts engines with ssh
doesn't mean it's the *only* way to start engines with ssh.  In most cases
where you are fully aware of your own system, it's fairly easy to write a
bash script in a few minutes that outperforms ipcluster with SSH across the
board, because it doesn't have to do anything clever, and you are able to
make all kinds of assumptions that ipcluster does not:

1. start a controller somewhere
2. [optional] distribute connection files if not on a shared filesystem
3. start engines with ssh

For instance, a sketch off the top of my head, which I just confirmed to
work in my office, which is on NFS/Linux:

#!/usr/bin/env bash

bg="screen -dmS"

$bg controller ipcontroller --profile=ssh --ip=

sleep 2

for host in edison langmuir; do
    ssh $host $bg engines ipcluster engines -n 4 --profile=ssh

And this is with *zero* config files - the ssh profile did not even exist
before running this script.

To kill engines:

for host in edison langmuir; do
    ssh $host screen -X -S engines quit

and stop the controller:

screen -X -S controller quit

I know it seems a little weird for me to be suggesting the answer to
ipcluster with SSH is to not use ipcluster at all, but this is honestly how
I work more often.  The SSHLaunchers are crummy.


> Thanks for your prompt and helpful answer again!
> J
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120612/2e7b4090/attachment-0001.html 

More information about the IPython-User mailing list