[IPython-User] ipcluster in ssh mode -

MinRK benjaminrk@gmail....
Tue Aug 9 11:54:41 CDT 2011


On Tue, Aug 9, 2011 at 05:44, Manuel Jung <mjung@astrophysik.uni-kiel.de>wrote:

>
>
> 2011/8/9 MinRK <benjaminrk@gmail.com>
>
>>
>>
>> On Mon, Aug 8, 2011 at 14:30, Manuel Jung <mjung@astrophysik.uni-kiel.de>wrote:
>>
>>> This is awesome! Thanks a lot! I gave your latest ipython git version a
>>> test and the registration with ssh tunneling for engines works. But i am not
>>> able to process any tasks. I get:
>>>
>>> rc.ids==[0]
>>> rc[:].map_sync(lambda x: x**2, range(10))
>>>
>>> ... does not return! Also no load on the engine is registered. How can i
>>> debug this? There is no output on the ipcluster log.
>>>
>>
>> My fault, see below.
>>
>> For debugging, I always recommend using ipcontroller/ipengine instead of
>> ipcluster, and add the `--debug` flag to maximize logging output.  It's much
>> easier to make sense of what's going on, even if it's not quite as
>> convenient.
>>
>>
>>
>>>
>>> Could you explain, why there are 6 instances of ipengine showing up in
>>> htop on my cluster node? (n==1)
>>>
>>
>> The tunnels are launched as subprocesses, so there should be 8 - 1 for
>> each of seven(!) tunnels, plus one for the engine itself.  The fact that
>> there are only 6 means two are missing, and it turns out that I managed to
>> forget to forward the shell streams (the ones used for execution - pretty
>> important).  I just rebased the branch on master with fixes, so if you check
>> it out again it should hopefully be in working order.
>>
>> -MinRK
>>
>
> Ok, now it works for n==1, or for n==4. But if i configure n==16 some
> engines fail to launch. This seems to be related to the error
>
> [IPClusterStart] Warning: Identity file ~/.ssh/ip_dsa.pub not accessible:
> No such file or directory.
>
> But why does this happen? Multiple reads shouldn't be a problem?
>

128 simultaneous reads is a lot, so that could be hitting some limit.


>
> I also tried without specifying an identity file, but using the system
> default one - still some ssh tunnels are failing.
>
> Some suggestion: Would it be possible/easier to just build one ssh
> connection with all tunnels?
>

I think this is possible, but I would have to rearrange some things.


> The process flood flood gets a little bit overwhelming with 16 cores,
> 16*8=128 processes.
>

This is why manual tunnels make more sense for engines.  It's silly to set
up a separate set of tunnels for each engine, when they are all pointing to
the same place.

Even if I fold them all into one process, you are still forwarding 128 ports
to the same 8 for no reason other than the engines don't know each other
exist (and the engines must be allowed to shutdown in any order).  If you do
the tunneling manually, and let the engines think they are local, then this
will all be more efficient.


>
> And maybe these could be made subprocesses of the ipengine call? Even if
> they timeout after 15 seconds this would be logical, wouldn't it?
>

They are launched with pexpect, which I assumed would be subprocesses, but
might not be.


>
> Cheers,
> Manuel
>
> Ps.: I am attaching the ipcluster log. Maybe it helps.
>
>
>
>
>
>>
>>> Also i get some failing tunnel setups for n>1, but let us focus on n==1
>>> for now.
>>>
>>>
>>>
>>>
>>> 2011/8/8 MinRK <benjaminrk@gmail.com>
>>>
>>>> As I mentioned, it was quite straightforward to add tunneling support,
>>>> at least for the simplest case:
>>>>
>>>> https://github.com/ipython/ipython/pull/685
>>>>
>>>> :)
>>>>
>>>> -MinRK
>>>>
>>>>
>>>> On Sun, Aug 7, 2011 at 15:17, MinRK <benjaminrk@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 7, 2011 at 14:25, Manuel Jung <
>>>>> mjung@astrophysik.uni-kiel.de> wrote:
>>>>>
>>>>>> Well no answers yet, but i made some progression.
>>>>>>
>>>>>> I was not able to work around the error, but i think i understand now,
>>>>>> why this does not work.
>>>>>>
>>>>>> The error appears, because the registration is successfull, but
>>>>>> everything else like heartbeat etc. fails. For these operations were no
>>>>>> ports forwarded.
>>>>>>
>>>>>> It is stated here
>>>>>>
>>>>>>
>>>>>> http://ipython.org/ipython-doc/stable/parallel/parallel_securitystandard.html#ssh<http://ipython.org/ipython-doc/stable/parallel/parallel_security.html#ssh>
>>>>>>
>>>>>> that tunneling for engines (which i tried) is not supported atm. I
>>>>>> tried to work around this, but only created a tunnel for the registration
>>>>>> socket - not for the other sockets, which are used by the engines. An
>>>>>> overview of them is given here:
>>>>>>
>>>>>>
>>>>>> http://ipython.org/ipython-doc/stable/development/parallel_connections.html#all-connections
>>>>>>
>>>>>> Well i did specify the registration port, but i did not specify ports
>>>>>> heartbeats etc. Am i able to do this to get homebrew engine tunneling? I saw
>>>>>> a bunch of options which are maybe related in the configuration for the
>>>>>> controller, but did'nt quite understud, which ones i had to alter.
>>>>>>
>>>>>> Maybe someone could point out, why there is no tunneling support for
>>>>>> engines there (yet)? Is there any particular reason for this, other than
>>>>>> just nobody did it yet?
>>>>>>
>>>>>
>>>>>  Correct, some amount of ssh tunneling will be added to the engine, it
>>>>> just hasn't been done.  The reason it's a lower priority than the
>>>>> client-controller connections is just that it's more rare that engines can't
>>>>> see the controller directly.  It's also slightly less valuable, because
>>>>> engines are often run in environments that cannot accept input, so only
>>>>> passwordless ssh will work.  The client tunnels allow for input of a
>>>>> password (though I doubt that it works in every case).
>>>>>
>>>>>
>>>>> As it stands now, there's no way to tell the engine to ignore the
>>>>> connection reply from the controller (which contains all of the
>>>>> non-registration connection info), so there are some restrictions on how you
>>>>> can trick the engine into connecting to different ports.  Essentially you
>>>>> will have to set up all 6 forwarded ports, and the Controller must be
>>>>> listening on localhost (can be in addition to localhost, e.g. 0.0.0.0 for
>>>>> all interfaces).
>>>>>
>>>>> Prevent the JSON connector file from disambiguating localhost
>>>>> connections to the controller's external IP by specifying loopback, e.g.:
>>>>>
>>>>> ipcontroller --ip=0.0.0.0 --location=127.0.0.1
>>>>>
>>>>> That way, engines will always try to connect to localhost, regardless
>>>>> of where the Controller actually is running, enabling them to use your
>>>>> tunnels.
>>>>>
>>>>> First, you must specify (or retrieve from the controller's debug
>>>>> output) all of the ports the controller is listening on for engine
>>>>> connections:
>>>>>
>>>>> in ipcontroller_config.py:
>>>>> # port-pairs:
>>>>> c.HubFactory.iopub
>>>>> c.HubFactory.hb
>>>>> c.HubFactory.task
>>>>> c.HubFactory.mux
>>>>> c.HubFactory.control
>>>>>
>>>>> Then you can specify the tunnels manually (the local ports *must* be
>>>>> the same, for now). That will be the first port of each Queue (iopub, task,
>>>>> mux, control) and both hb ports, and the registration port.
>>>>>
>>>>> So, I was able to get this running with the following commands:
>>>>>
>>>>> 1. start the controller, listening on all interfaces and forcing
>>>>> loopback IP for disambiguation:
>>>>>
>>>>>  [controller] $> ipcontroller --ip=0.0.0.0 --location=127.0.0.1
>>>>> --port=10101 --HubFactory.hb=10102,10112 --HubFactory.control=10203,10103
>>>>> --HubFactory.mux=10204,10104 --HubFactory.task=10205,10105
>>>>>
>>>>> # (with this pattern, 101XY ports are ports visible to the engine,
>>>>> 102XY are client-only)
>>>>>
>>>>> 2. Set up forwarded ports on the engines.
>>>>>
>>>>> [engine] $> for port in 10101 10102 10112 10103 10104 10105; do ssh
>>>>> $server -f -N -L $port:$controller:$port; done
>>>>>
>>>>> In my case, $server was a third machine that I have ssh access to that
>>>>> has access to $controller, where the controller process is running.  If you
>>>>> are tunneling directly, then $server would be the controller's IP, and
>>>>> $controller would be 127.0.0.1
>>>>>
>>>>> 3. connect the engine
>>>>>
>>>>> [engine] $>  ipengine --f=/path/to/ipcontroller-engine.json
>>>>>
>>>>> # note that if you are on a shared filesystem, just `ipengine` should
>>>>> work.
>>>>>
>>>>> Implementing support for the easiest case should be quite
>>>>> straightforward, and less tedious than this. (Pull requests welcome!).
>>>>>
>>>>> I hope that helps.
>>>>>
>>>>> -MinRK
>>>>>
>>>>>
>>>>>> Thanks!
>>>>>> Manuel
>>>>>>
>>>>>> _______________________________________________
>>>>>> IPython-User mailing list
>>>>>> IPython-User@scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20110809/a5a50c43/attachment.html 


More information about the IPython-User mailing list