[IPython-User] parallel ssh problems

MinRK benjaminrk@gmail....
Mon Oct 24 15:34:46 CDT 2011


On Mon, Oct 24, 2011 at 08:13, Toby Burnett <tburnett@uw.edu> wrote:

>  Thanks for the clarifications, but my complaint was really that I don’t
> understand why it started the engines, and then shut them down. Are those
> tcgetattr messages significant?
>

ipcluster did not shut them down, they stopped themselves, presumably
because they couldn't connect. The tcgetattr messages are probably not
significant, I think they are warnings about the tunneled session not being
a real shell.

The odds are, your engines are not able to see your controller.  Especially
if it looks like the engines are shutting down after two seconds - the
default timeout for registration.  You can view the logs in
~/.ipython/profile_default/log/ipengine-<pid>.log.  Did you specify the
controller's IP as one visible from the other nodes? The default is
localhost, which won't be visible to the engines. For security reasons, we
don't listen on external interfaces by default, which means that you have to
do at least some configuration if you want to use nonlocal engines.

When debugging, it is often much easier to *not* use ipcluster, and use
ipcontroller and one or more calls to ipengine itself.  Until you have one
engine connected manually to the controller, the opacity of ipcluster is
counterproductive.  All the ipcluster script does is call ipcontroller once,
and ipengine a few times, so the behavior should not be different.  The
Launcher classes are just ways to call ipengine in various contexts.  The
SSH launcher is essentially: `ssh <host> "ipengine"`, but with some extra
args to match profiles, and send log output to files.



> ****
>
> ** **
>
> --Toby****
>
> ** **
>
> *From:* MinRK [mailto:benjaminrk@gmail.com]
> *Sent:* Sunday, October 23, 2011 14:24
> *To:* Toby Burnett
> *Cc:* ipython-user@scipy.org
> *Subject:* Re: [IPython-User] parallel ssh problems****
>
> ** **
>
> ** **
>
> On Sun, Oct 23, 2011 at 11:38, Toby Burnett <tburnett@uw.edu> wrote:****
>
> Sorry, after reading the instructions, I realized that I set the wrong
> value, but there is some confusion between the online help and instructions
> in the generated config-ssh/ipcluster_config.py, so I put in both lines.**
> **
>
> ** **
>
> Argh, when in conflict, the online docs are out of date.  I'll update them
> now.  The default config files are automatically generated from the
> configurable objects,****
>
> so a fresh `ipython profile create <name> --parallel` can't be out of date.
> ****
>
>  ****
>
>
> c.IPClusterEngines.engine_launcher_class = 'SSHEngineSetLauncher'
> c.IPClusterEngines.engine_launcher =
> 'IPython.parallel.apps.launcher.SSHEngineSetLauncher'****
>
>  ** **
>
> Changes from 0.11-0.12:****
>
>   * added the _class to be more clear****
>
>   * allowed launchers from IPython.parallel.apps.launcher to be specified
> by classname only, for convenience.  In fact, you can now just specify 'SSH'
> or 'MPIExec', and it will resolve
> to 'IPython.parallel.apps.launcher.SSHFooLauncher'.****
>
> ** **
>
> Obviously, trying to clarify things without updating the docs is not a
> great success.  What you have is exactly right for a config file to work on
> both 0.11 and 0.12.  I will add a deprecation warning on the old name, so
> that users moving from 0.11 to 0.12 get some help, and some more detail to
> docs and helpstrings, to hopefully avoid future confusion.****
>
> ** **
>
>
> and I set c.SSHEngineSetLauncher.engines to {'tev01':4}, another machine
> from the one I ran ipcluster
> The results follow: the last line is very confusing; I have no idea where
> it got the non-extentent machine names.****
>
>  ** **
>
> ha, that's just a poor choice on my part.  When you start multiple engines
> on a single host, their keys in the dict that tracks them (which you are
> seeing in the log message) will be 'host0', 'host1', 'host2', etc..
>  Obviously, that doesn't sit well with nodeNN machine naming, because they
> still look like machine names.  I'll add a '/' separator, so it's clearer
> that these are four engines on 'tev01', not one engine each on 'tev011' etc.
> ****
>
>  ****
>
>
> tev11:~/analysis[878]$ipcluster start --profile=ssh &****
>
> [IPClusterStart] Using existing profile dir:
> u'/phys/users/tburnett/.ipython/profile_ssh'
> will start the following engines: {'tev01': 4}****
>
> [IPClusterStart] Starting ipcluster with [daemon=False]
> [IPClusterStart] Creating pid file:
> /phys/users/tburnett/.ipython/profile_ssh/pid/ipcluster.pid****
>
> [IPClusterStart] Starting LocalControllerLauncher:
> ['/phys/users/olsont/TEV/Glast/python27/bin/python2.7',
> u'/phys/users/olsont/TEV/Glast/python27/lib/python2.7/site-packages/ipython-0.11-py2.7.egg/IPython/parallel/apps/ipcontrollerapp.py',
> '--log-to-file', '--log-level=20',
> u'--profile-dir=/phys/users/tburnett/.ipython/profile_ssh']
> [IPClusterStart] Process
> '/phys/users/olsont/TEV/Glast/python27/bin/python2.7' started: 20849
> [IPClusterStart] [IPControllerApp] Using existing profile dir:
> u'/phys/users/tburnett/.ipython/profile_ssh'
> [IPClusterStart] Scheduler started [leastload]
> [IPClusterStart] Starting 24 engines
> [IPClusterStart] Process 'ssh' started: 20868
> [IPClusterStart] Starting SSHEngineSetLauncher: ['ssh', '-tt',
> u'tburnett@tev01', '/phys/users/olsont/TEV/Glast/python27/bin/python2.7',
> u'/phys/users/olsont/TEV/Glast/python27/lib/python2.7/site-packages/ipython-0.11-py2.7.egg/IPython/parallel/apps/ipengineapp.py',
> '--log-to-file', '--log-level=20']
> [IPClusterStart] Process 'ssh' started: 20869
> [IPClusterStart] Process 'ssh' started: 20870
> [IPClusterStart] Process 'ssh' started: 20871
> [IPClusterStart] Process 'engine set' started: [None, None, None, None]
> [IPClusterStart] tcgetattr: Invalid argument
> [IPClusterStart] tcgetattr: Invalid argument
> [IPClusterStart] tcgetattr: Invalid argument
> [IPClusterStart] tcgetattr: Invalid argument
> [IPClusterStart] [IPEngineApp] Using existing profile dir:
> u'/phys/users/tburnett/.ipython/profile_default'
> [IPClusterStart] [IPEngineApp] Using existing profile dir:
> u'/phys/users/tburnett/.ipython/profile_default'
> [IPClusterStart] [IPEngineApp] Using existing profile dir:
> u'/phys/users/tburnett/.ipython/profile_default'
> [IPClusterStart] [IPEngineApp] Using existing profile dir:
> u'/phys/users/tburnett/.ipython/profile_default'
> [IPClusterStart] Connection to tev01 closed.
> [IPClusterStart] Process 'ssh' stopped: {'pid': 20870, 'exit_code': 255}
> [IPClusterStart] Connection to tev01 closed.
> [IPClusterStart] Process 'ssh' stopped: {'pid': 20869, 'exit_code': 255}
> [IPClusterStart] Connection to tev01 closed.
> [IPClusterStart] Process 'ssh' stopped: {'pid': 20868, 'exit_code': 255}
> [IPClusterStart] Connection to tev01 closed.
> [IPClusterStart] Process 'ssh' stopped: {'pid': 20871, 'exit_code': 255}
> [IPClusterStart] Process 'engine set' stopped: {'tev012': {'pid': 20870,
> 'exit_code': 255}, 'tev013': {'pid': 20871, 'exit_code': 255}, 'tev010':
> {'pid': 20868, 'exit_code': 255}, 'tev011': {'pid': 20869, 'exit_code':
> 255}}****
>
>
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user****
>
>  ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20111024/4b49adef/attachment.html 


More information about the IPython-User mailing list