Hi,<div><br></div><div>So i have been browsing around the sources, looking for another solution, to make my use case built in ipcluster, because i were feeling stupid for writing a script to setup the cluster, if this is, what ipcluster should do for me. </div>
<div>So this is my solution:</div><div><br></div><div>Since we are on a totaly restricted network/pc and ports are never to be reached (execpt for ssh/22) outside of localhost, it is totally save to choose static ports like you suggested in your first post in this thread. </div>
<div><br></div><div><div><div><div>c.LocalControllerLauncher.controller_args = ['--log-level=20', '--ip=0.0.0.0', '--location=127.0.0.1', '--port=10101', '--HubFactory.hb=10102,10112', 'HubFactory.control=10203,10103', '--HubFactory.mux=10204,10104', '--HubFactory.task=10205,10105']</div>
</div></div></div><div><br></div><div>For tunneling from the engines' host, i have implemented an additional parameter for the SSHEngineSetLauncher. It allows to run a shell command on the engines' host. In this case it is used to establish all tunnels. </div>
<div><br></div><div><div><div>tunnel = ['ssh dwarf20 -N -L10101:<a href="http://127.0.0.1:10101">127.0.0.1:10101</a> -L10102:<a href="http://127.0.0.1:10102">127.0.0.1:10102</a> -L10112:<a href="http://127.0.0.1:10112">127.0.0.1:10112</a> -L10103:<a href="http://127.0.0.1:10103">127.0.0.1:10103</a> -L10104:<a href="http://127.0.0.1:10104">127.0.0.1:10104</a> -L10105:127.0.0.1:10105'.split()]</div>
</div><div>c.SSHEngineSetLauncher.engines = {'pluto' : (16, None, tunnel),</div><div> 'merkur' : (4, None, tunnel)}</div><div><br></div><div>(dwarf20 is the cluster starting client an controller hosting pc, pluto and merkur servers for number crunching, e.g. engines' hosts.)</div>
<div><br></div><div><div>Let me say at this point, that establishing tunnels for all ports in one command isn't always a good idea, because they share the same tcp connections and bandwidth is restricted on a per connection basis. So maybe this may be a bottleneck under high load.</div>
</div><div><br></div><div>Still this is not enough for getting all connections working. On pluto with 16 cores i experienced often less than 16 successfull connected engines. I found, that simultaneous authentications to an sshd are restricted to 10 by the MaxStartups parameter (see man sshd_config(5)). So i introduced a new parameter for delaying consecutive ssh connections.</div>
<div><br></div><div><div>c.SSHEngineSetLauncher.delay = 0.2</div></div><div><br></div><div>The complete setup from ipcluster_config.py can be found in the post scriptum. I have created a branch on github for this, see <a href="https://github.com/gzahl/ipython/tree/sshenvironment">https://github.com/gzahl/ipython/tree/sshenvironment</a></div>
<div><br></div><div>This works for me at the moment, what do you think about this solution.</div><div><br></div><div>Two last thoughts:</div><div>- It would be nice, if one wouldn't have to specify the port configuration an tunnel command explicit. It would be nice if you could only define the ports and activate tunneling=yes. But i'm not sure how this could be done best - yet.</div>
<div>- I have to define '--profile=ssh' in the program_args for the SSHEngineLauncher - shouldn't this be automaticly choosen, if i'm starting with "ipcluster start --profile=ssh"? It seems like a bug to me?</div>
<div>- I were testing with the ControlMaster feature of SSH (version 4 or greater). It reuses a existing tcp connection and can speed up new ssh connections. But one would ran into the only-one-tcp-connection issues again. Do you know this command? I'm not sure if it is of use in this case. But it could help to lower the SSHEngineSetLauncher.delay parameter.</div>
<div><br></div><div>Cheers</div><div>Manuel</div><div><br></div><div><br></div><div>ipcluster_config.py:</div><div><br></div><div>c = get_config()</div><div>c.IPClusterStart.engine_launcher_class = 'SSHEngineSetLauncher'</div>
<div>c.IPClusterStart.delay = 2.0</div><div><div>c.LocalControllerLauncher.controller_args = ['--log-level=20', '--ip=0.0.0.0', '--location=127.0.0.1', '--port=10101', '--HubFactory.hb=10102,10112', 'HubFactory.control=10203,10103', '--HubFactory.mux=10204,10104', '--HubFactory.task=10205,10105']</div>
</div><div># Are hard coded paths really a reasonable default? On my systems this doesn't make much sense.</div><div>c.LocalControllerLauncher.controller_cmd = ['ipcontroller']</div><div>c.SSHEngineLauncher.program = ['ipengine']</div>
<div>c.SSHEngineLauncher.program_args = ['--log_level=20', '--profile=ssh']</div><div>c.SSHEngineSetLauncher.engine_args = ['--log-level=20', '--profile=ssh']</div><div>c.SSHEngineSetLauncher.delay = 0.2</div>
<div>tunnel = ['ssh dwarf20 -N -L10101:<a href="http://127.0.0.1:10101">127.0.0.1:10101</a> -L10102:<a href="http://127.0.0.1:10102">127.0.0.1:10102</a> -L10112:<a href="http://127.0.0.1:10112">127.0.0.1:10112</a> -L10103:<a href="http://127.0.0.1:10103">127.0.0.1:10103</a> -L10104:<a href="http://127.0.0.1:10104">127.0.0.1:10104</a> -L10105:127.0.0.1:10105'.split()]</div>
<div>c.SSHEngineSetLauncher.engines = {'pluto' : (16, None, tunnel),</div><div> 'merkur' : (4, None, tunnel)}</div><div><br></div></div>