[IPython-User] ipcluster in ssh mode -

MinRK benjaminrk@gmail....
Mon Aug 15 15:54:47 CDT 2011


On Fri, Aug 12, 2011 at 03:44, Manuel Jung <mjung@astrophysik.uni-kiel.de>wrote:

> Hi,
>
> So i have been browsing around the sources, looking for another solution,
> to make my use case built in ipcluster, because i were feeling stupid for
> writing a script to setup the cluster, if this is, what ipcluster should do
> for me.
>

Honestly, there are many situations for which writing a simple bash script
or using screen will always be better than  ipcluster.  Since there are so
many possible configurations and considerations, a general tool will always
be more complicated than one that caters to a particular environment.  For
instance, you should probably create only one set of tunnels per machine,
rather than per engine, since 16*8 tunnels is *a lot*, and largely pointless
(it is entirely workload/system dependent which is preferable).  Certainly,
ipcluster should handle your case better, but that doesn't mean it's the
ideal tool for you.


> So this is my solution:
>
> Since we are on a totaly restricted network/pc and ports are never to be
> reached (execpt for ssh/22) outside of localhost, it is totally save to
> choose static ports like you suggested in your first post in this thread.
>
> c.LocalControllerLauncher.controller_args = ['--log-level=20',
> '--ip=0.0.0.0', '--location=127.0.0.1', '--port=10101',
> '--HubFactory.hb=10102,10112', 'HubFactory.control=10203,10103',
> '--HubFactory.mux=10204,10104', '--HubFactory.task=10205,10105']
>
> For tunneling from the engines' host, i have implemented an additional
> parameter for the SSHEngineSetLauncher. It allows to run a shell command on
> the engines' host. In this case it is used to establish all tunnels.
>
> tunnel = ['ssh dwarf20 -N -L10101:127.0.0.1:10101 -L10102:127.0.0.1:10102-L10112:
> 127.0.0.1:10112 -L10103:127.0.0.1:10103 -L10104:127.0.0.1:10104-L10105:127.0.0.1:10105
> '.split()]
> c.SSHEngineSetLauncher.engines = {'pluto' : (16, None, tunnel),
>                                                        'merkur' : (4, None,
> tunnel)}
>
> (dwarf20 is the cluster starting client an controller hosting pc, pluto and
> merkur servers for number crunching, e.g. engines' hosts.)
>
> Let me say at this point, that establishing tunnels for all ports in one
> command isn't always a good idea, because they share the same tcp
> connections and bandwidth is restricted on a per connection basis. So maybe
> this may be a bottleneck under high load.
>

I think it is highly unlikely that putting all the traffic of a single
engine on one tcp connection would be a bottle neck, because under most
normal usage, there will not be significant traffic on multiple sockets at
the same time.  A case that could bottleneck would be a very large number of
very short tasks that print a lot to stdout/err.


>
> Still this is not enough for getting all connections working. On pluto with
> 16 cores i experienced often less than 16 successfull connected engines. I
> found, that simultaneous authentications to an sshd are restricted to 10 by
> the MaxStartups parameter (see man sshd_config(5)). So i introduced a  new
> parameter for delaying consecutive ssh connections.
>
> c.SSHEngineSetLauncher.delay = 0.2
>

delay is great, I will actually add it to the LocalEngineSetLauncher,
because it should even be useful at that level (SSHEngineSetLauncher will
inherit it).


>
> The complete setup from ipcluster_config.py can be found in the post
> scriptum. I have created a branch on github for this, see
> https://github.com/gzahl/ipython/tree/sshenvironment
>
> This works for me at the moment, what do you think about this solution.
>

Thanks for working this out! I think it should be a good starting point.
 Some things should probably change - We don't want to define a class
n*p-times inside another, and reindenting the code so it doesn't match the
rest of the file is probably not desirable.  It's possible that a simple
configurable preflight script on the SSHLauncher would be a cleaner
solution, and provide an avenue for more general customization.


> Two last thoughts:
> - It would be nice, if one wouldn't have to specify the port configuration
> an tunnel command explicit. It would be nice if you could only define the
> ports and activate tunneling=yes. But i'm not sure how this could be done
> best - yet.
>

Something like this would definitely be valuable.  Note that with my
enginessh branch, if the Controller was launched with --enginessh=anything,
then tunneling *will* be enabled by default (if the `ssh` field of the JSON
file is specified, it is used. You can edit it manually after starting the
controller, if you like).

The problem with just 'tunneling=yes' is that it's extremely variable what
tunneling will look like. We can support one or two simple cases (like the
one I cover in enginessh).


> - I have to define '--profile=ssh' in the program_args for the
> SSHEngineLauncher - shouldn't this be automaticly choosen, if i'm starting
> with "ipcluster start --profile=ssh"? It seems like a bug to me?
>

This presumes that the profile exists on the remote machine and that the
initial profile was specified by name and not by path, which is
insufficiently general.  What should actually happen is to send the
connection file and use it explicitly, with no assumptions about the remote
filesystem, or remote profiles available.


> - I were testing with the ControlMaster feature of SSH (version 4 or
> greater). It reuses a existing tcp connection and can speed up new ssh
> connections. But one would ran into the only-one-tcp-connection issues
> again. Do you know this command? I'm not sure if it is of use in this case.
> But it could help to lower the SSHEngineSetLauncher.delay parameter.
>

I am aware of it.  Since we can't depend on it, I'm not sure how valuable it
is to ipcluster in general (another case where writing against your own
environment lets you make assumptions that aren't appropriate for
ipcluster).

In general, the SSH launchers need to be improved.  The sshx code in 0.10.2
was better in many ways, but not in others.


>
> Cheers
> Manuel
>
>
> ipcluster_config.py:
>
> c = get_config()
> c.IPClusterStart.engine_launcher_class = 'SSHEngineSetLauncher'
> c.IPClusterStart.delay = 2.0
> c.LocalControllerLauncher.controller_args = ['--log-level=20',
> '--ip=0.0.0.0', '--location=127.0.0.1', '--port=10101',
> '--HubFactory.hb=10102,10112', 'HubFactory.control=10203,10103',
> '--HubFactory.mux=10204,10104', '--HubFactory.task=10205,10105']
> # Are hard coded paths really a reasonable default? On my systems this
> doesn't make much sense.
>

For local launchers, they absolutely are.  This means that the programs will
be run from the same Python, etc. as the ipcluster script.  Otherwise there
could be weird situations where 'ipcontroller' launched in a subprocess
actually points to a different Python or IPython than the one launching it
(this has happened *many* times).



> c.LocalControllerLauncher.controller_cmd = ['ipcontroller']
> c.SSHEngineLauncher.program = ['ipengine']
> c.SSHEngineLauncher.program_args = ['--log_level=20', '--profile=ssh']
> c.SSHEngineSetLauncher.engine_args = ['--log-level=20', '--profile=ssh']
> c.SSHEngineSetLauncher.delay = 0.2
> tunnel = ['ssh dwarf20 -N -L10101:127.0.0.1:10101 -L10102:127.0.0.1:10102-L10112:
> 127.0.0.1:10112 -L10103:127.0.0.1:10103 -L10104:127.0.0.1:10104-L10105:127.0.0.1:10105
> '.split()]
> c.SSHEngineSetLauncher.engines = {'pluto' : (16, None, tunnel),
>                                                        'merkur' : (4, None,
> tunnel)}
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20110815/f54081b9/attachment-0001.html 


More information about the IPython-User mailing list