[IPython-User] parallel with ssh

MinRK benjaminrk@gmail....
Wed Oct 26 16:14:26 CDT 2011


On Wed, Oct 26, 2011 at 05:20, Toby Burnett <tburnett@uw.edu> wrote:

>  A follow-up on a previous thread, and a difference with version 0.10.  **
> **
>
> ** **
>
> First the previous “ipcluster ssh –clusterfile …” worked nicely with only
> the need to specify a simple file.  Now it is necessary to specify not only
> the set of engines, but to set both engine and controller options for
> communication: perhaps the code that implements ipcluster could either set
> this, or clearly warn the user?
>

I think we can add warnings to ipcluster (we do exactly
that<http://ipython.org/ipython-doc/dev/parallel/parallel_process.html#general-considerations>,
first thing in the docs), though I'm actually not 100% sure how to
distinguish the failure to connect event from a later shutdown, except by
time.  The problem  with just 'set this', is there is no obvious answer to
what should be set.  For instance, if your engines are tunneling their
connection to the controller, then there is no need for the controller to
listen on a public IP.  Further, if you have config files on the remote
machine all set up, then there's nothing for ipcluster to do, nor any way
for ipcluster to detect that your config is correct.  If we are to set the
default IP to something other than localhost (This *mustn't* be done in
general, but possibly when using ssh engines), should it be all interfaces,
or just one of possibly many public IPs?

The reason we don't listen on external interfaces by default is that our
communication is not encrypted (unauthorized execution is prevented, but not
viewing of currently active result output) - anyone with access to the
listening ports on your controller can see your output.  For this reason, we
don't want to decide for you to listen on public interfaces, you have to
make that choice.

But the real answer is that the SSH launcher is, in many ways, a regression
from 0.10, notably:

* it depends on shared filesystem (assuming otherwise default config)
* it requires the launching ssh processes to maintain connections (can't be
used from a laptop, which disconnects)

Among others.  Any improvements would be extremely welcome.


> ****
>
> ** **
>
> But my advice, thanks MinRK,  is to avoid ipcluster and start the
> controller and engines myself, in fact necessary if the controller and
> engines are not on a shared filesystem.  I’ve done so in a class with a loop
> like this****
>
> ** **
>
>         for host, n in self.engines.items():****
>
>             for i in range(n):****
>
>                 cmd = 'ssh %s@%s ipengine
> --file=%s&'%(self.user,host,self.json_file)****
>
>    os.system(cmd))****
>
>                 time.sleep(delay)****
>
> ** **
>
> It works, but leaves a lot of background ssh processes on the machine where
> this runs: studying the very elegant code in
> IPython/parallel/apps/launcher.py I’m sure that it should be easy to start n
> engines on host m by creating an SSHEngineSetLauncher. Am I right?
>

Yes, just like you can write your own launchers for ipcluster to use, you
can use our launchers outside ipcluster.


> ****
>
> ** **
>
> --Toby****
>
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20111026/ec765ddb/attachment.html 


More information about the IPython-User mailing list