[IPython-User] ipcluster in ssh mode -

Manuel Jung mjung@astrophysik.uni-kiel...
Tue Aug 9 07:44:14 CDT 2011


2011/8/9 MinRK <benjaminrk@gmail.com>

>
>
> On Mon, Aug 8, 2011 at 14:30, Manuel Jung <mjung@astrophysik.uni-kiel.de>wrote:
>
>> This is awesome! Thanks a lot! I gave your latest ipython git version a
>> test and the registration with ssh tunneling for engines works. But i am not
>> able to process any tasks. I get:
>>
>> rc.ids==[0]
>> rc[:].map_sync(lambda x: x**2, range(10))
>>
>> ... does not return! Also no load on the engine is registered. How can i
>> debug this? There is no output on the ipcluster log.
>>
>
> My fault, see below.
>
> For debugging, I always recommend using ipcontroller/ipengine instead of
> ipcluster, and add the `--debug` flag to maximize logging output.  It's much
> easier to make sense of what's going on, even if it's not quite as
> convenient.
>
>
>
>>
>> Could you explain, why there are 6 instances of ipengine showing up in
>> htop on my cluster node? (n==1)
>>
>
> The tunnels are launched as subprocesses, so there should be 8 - 1 for each
> of seven(!) tunnels, plus one for the engine itself.  The fact that there
> are only 6 means two are missing, and it turns out that I managed to forget
> to forward the shell streams (the ones used for execution - pretty
> important).  I just rebased the branch on master with fixes, so if you check
> it out again it should hopefully be in working order.
>
> -MinRK
>

Ok, now it works for n==1, or for n==4. But if i configure n==16 some
engines fail to launch. This seems to be related to the error

[IPClusterStart] Warning: Identity file ~/.ssh/ip_dsa.pub not accessible: No
such file or directory.

But why does this happen? Multiple reads shouldn't be a problem?

I also tried without specifying an identity file, but using the system
default one - still some ssh tunnels are failing.

Some suggestion: Would it be possible/easier to just build one ssh
connection with all tunnels? The process flood flood gets a little bit
overwhelming with 16 cores, 16*8=128 processes.

And maybe these could be made subprocesses of the ipengine call? Even if
they timeout after 15 seconds this would be logical, wouldn't it?

Cheers,
Manuel

Ps.: I am attaching the ipcluster log. Maybe it helps.





>
>> Also i get some failing tunnel setups for n>1, but let us focus on n==1
>> for now.
>>
>>
>>
>>
>> 2011/8/8 MinRK <benjaminrk@gmail.com>
>>
>>> As I mentioned, it was quite straightforward to add tunneling support, at
>>> least for the simplest case:
>>>
>>> https://github.com/ipython/ipython/pull/685
>>>
>>> :)
>>>
>>> -MinRK
>>>
>>>
>>> On Sun, Aug 7, 2011 at 15:17, MinRK <benjaminrk@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Sun, Aug 7, 2011 at 14:25, Manuel Jung <
>>>> mjung@astrophysik.uni-kiel.de> wrote:
>>>>
>>>>> Well no answers yet, but i made some progression.
>>>>>
>>>>> I was not able to work around the error, but i think i understand now,
>>>>> why this does not work.
>>>>>
>>>>> The error appears, because the registration is successfull, but
>>>>> everything else like heartbeat etc. fails. For these operations were no
>>>>> ports forwarded.
>>>>>
>>>>> It is stated here
>>>>>
>>>>>
>>>>> http://ipython.org/ipython-doc/stable/parallel/parallel_securitystandard.html#ssh<http://ipython.org/ipython-doc/stable/parallel/parallel_security.html#ssh>
>>>>>
>>>>> that tunneling for engines (which i tried) is not supported atm. I
>>>>> tried to work around this, but only created a tunnel for the registration
>>>>> socket - not for the other sockets, which are used by the engines. An
>>>>> overview of them is given here:
>>>>>
>>>>>
>>>>> http://ipython.org/ipython-doc/stable/development/parallel_connections.html#all-connections
>>>>>
>>>>> Well i did specify the registration port, but i did not specify ports
>>>>> heartbeats etc. Am i able to do this to get homebrew engine tunneling? I saw
>>>>> a bunch of options which are maybe related in the configuration for the
>>>>> controller, but did'nt quite understud, which ones i had to alter.
>>>>>
>>>>> Maybe someone could point out, why there is no tunneling support for
>>>>> engines there (yet)? Is there any particular reason for this, other than
>>>>> just nobody did it yet?
>>>>>
>>>>
>>>>  Correct, some amount of ssh tunneling will be added to the engine, it
>>>> just hasn't been done.  The reason it's a lower priority than the
>>>> client-controller connections is just that it's more rare that engines can't
>>>> see the controller directly.  It's also slightly less valuable, because
>>>> engines are often run in environments that cannot accept input, so only
>>>> passwordless ssh will work.  The client tunnels allow for input of a
>>>> password (though I doubt that it works in every case).
>>>>
>>>>
>>>> As it stands now, there's no way to tell the engine to ignore the
>>>> connection reply from the controller (which contains all of the
>>>> non-registration connection info), so there are some restrictions on how you
>>>> can trick the engine into connecting to different ports.  Essentially you
>>>> will have to set up all 6 forwarded ports, and the Controller must be
>>>> listening on localhost (can be in addition to localhost, e.g. 0.0.0.0 for
>>>> all interfaces).
>>>>
>>>> Prevent the JSON connector file from disambiguating localhost
>>>> connections to the controller's external IP by specifying loopback, e.g.:
>>>>
>>>> ipcontroller --ip=0.0.0.0 --location=127.0.0.1
>>>>
>>>> That way, engines will always try to connect to localhost, regardless of
>>>> where the Controller actually is running, enabling them to use your tunnels.
>>>>
>>>> First, you must specify (or retrieve from the controller's debug output)
>>>> all of the ports the controller is listening on for engine connections:
>>>>
>>>> in ipcontroller_config.py:
>>>> # port-pairs:
>>>> c.HubFactory.iopub
>>>> c.HubFactory.hb
>>>> c.HubFactory.task
>>>> c.HubFactory.mux
>>>> c.HubFactory.control
>>>>
>>>> Then you can specify the tunnels manually (the local ports *must* be the
>>>> same, for now). That will be the first port of each Queue (iopub, task, mux,
>>>> control) and both hb ports, and the registration port.
>>>>
>>>> So, I was able to get this running with the following commands:
>>>>
>>>> 1. start the controller, listening on all interfaces and forcing
>>>> loopback IP for disambiguation:
>>>>
>>>>  [controller] $> ipcontroller --ip=0.0.0.0 --location=127.0.0.1
>>>> --port=10101 --HubFactory.hb=10102,10112 --HubFactory.control=10203,10103
>>>> --HubFactory.mux=10204,10104 --HubFactory.task=10205,10105
>>>>
>>>> # (with this pattern, 101XY ports are ports visible to the engine, 102XY
>>>> are client-only)
>>>>
>>>> 2. Set up forwarded ports on the engines.
>>>>
>>>> [engine] $> for port in 10101 10102 10112 10103 10104 10105; do ssh
>>>> $server -f -N -L $port:$controller:$port; done
>>>>
>>>> In my case, $server was a third machine that I have ssh access to that
>>>> has access to $controller, where the controller process is running.  If you
>>>> are tunneling directly, then $server would be the controller's IP, and
>>>> $controller would be 127.0.0.1
>>>>
>>>> 3. connect the engine
>>>>
>>>> [engine] $>  ipengine --f=/path/to/ipcontroller-engine.json
>>>>
>>>> # note that if you are on a shared filesystem, just `ipengine` should
>>>> work.
>>>>
>>>> Implementing support for the easiest case should be quite
>>>> straightforward, and less tedious than this. (Pull requests welcome!).
>>>>
>>>> I hope that helps.
>>>>
>>>> -MinRK
>>>>
>>>>
>>>>> Thanks!
>>>>> Manuel
>>>>>
>>>>> _______________________________________________
>>>>> IPython-User mailing list
>>>>> IPython-User@scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20110809/21a4f574/attachment-0001.html 
-------------- next part --------------
[IPClusterStart] Using existing profile dir: u'/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] Starting ipcluster with [daemon=False]
[IPClusterStart] Creating pid file: /home/mjung/.config/ipython/profile_ssh/pid/ipcluster.pid
[IPClusterStart] Starting LocalControllerLauncher: ['/home/mjung/src/epd-7.1-2-rh5-x86_64/bin/python', u'/home/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/apps/ipcontrollerapp.py', '--log-level=20', u'--profile-dir=/home/mjung/.config/ipython/profile_ssh']
[IPClusterStart] Process '/home/mjung/src/epd-7.1-2-rh5-x86_64/bin/python' started: 12223
[IPClusterStart] [IPControllerApp] Using existing profile dir: u'/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPControllerApp] Hub listening on tcp://127.0.0.1:57968 for registration.
[IPClusterStart] [IPControllerApp] Hub using DB backend: 'IPython.parallel.controller.dictdb.DictDB'
[IPClusterStart] [IPControllerApp] hub::created hub
[IPClusterStart] [IPControllerApp] task::using Python leastload Task scheduler
[IPClusterStart] [IPControllerApp] Heartmonitor started
[IPClusterStart] [IPControllerApp] Creating pid file: /home/mjung/.config/ipython/profile_ssh/pid/ipcontroller.pid
[IPClusterStart] Scheduler started [leastload]
[IPClusterStart] Starting 16 engines
[IPClusterStart] Process 'ssh' started: 12242
[IPClusterStart] Starting SSHEngineSetLauncher: ['ssh', '-tt', u'pluto', 'ipengine', '--log-level=20', '--profile=ssh']
[IPClusterStart] Process 'ssh' started: 12243
[IPClusterStart] Process 'ssh' started: 12245
[IPClusterStart] Process 'ssh' started: 12246
[IPClusterStart] Process 'ssh' started: 12247
[IPClusterStart] Process 'ssh' started: 12248
[IPClusterStart] Process 'ssh' started: 12249
[IPClusterStart] Process 'ssh' started: 12250
[IPClusterStart] Process 'ssh' started: 12251
[IPClusterStart] Process 'ssh' started: 12252
[IPClusterStart] Process 'ssh' started: 12253
[IPClusterStart] Process 'ssh' started: 12254
[IPClusterStart] Process 'ssh' started: 12255
[IPClusterStart] Process 'ssh' started: 12256
[IPClusterStart] Process 'ssh' started: 12257
[IPClusterStart] Process 'ssh' started: 12258
[IPClusterStart] Process 'engine set' started: [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Using existing profile dir: u'/astro/home/mjung/.config/ipython/profile_ssh'
[IPClusterStart] [IPEngineApp] Loading url_file u'/astro/home/mjung/.config/ipython/profile_ssh/security/ipcontroller-engine.json'
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] [IPEngineApp] Registering with controller at tcp://127.0.0.1:57968
[IPClusterStart] 255
[IPClusterStart] Warning: Identity file ~/.ssh/ip_dsa.pub not accessible: No such file or directory.

[IPClusterStart] ssh_exchange_identification: Connection closed by remote host


[IPClusterStart] 
[IPClusterStart] <class 'IPython.external.pexpect._pexpect.EOF'>
[IPClusterStart] ERROR:root:Error in periodic callback
[IPClusterStart] Traceback (most recent call last):
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 429, in _run
[IPClusterStart]     self.callback()
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 129, in register
[IPClusterStart]     connect(reg, self.url)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 104, in connect
[IPClusterStart]     password=password,
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 113, in tunnel_connection
[IPClusterStart]     new_url, tunnel = open_tunnel(addr, server, keyfile=keyfile, password=password, paramiko=paramiko)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 139, in open_tunnel
[IPClusterStart]     tunnel = tunnelf(lport, rport, server, remoteip=ip, keyfile=keyfile, password=password)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 194, in openssh_tunnel
[IPClusterStart]     raise RuntimeError("tunnel '%s' failed to start"%(cmd))
[IPClusterStart] RuntimeError: tunnel 'ssh -i ~/.ssh/ip_dsa.pub -f -L 127.0.0.1:42736:127.0.0.1:57968 dwarf20 sleep 15' failed to start
[IPClusterStart] [IPControllerApp] client::client 'c5212881-9204-4968-9f9e-aedd8364db5b' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client 'bdb17be7-934c-4310-bf27-9ad3e4f63389' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client 'e9fe9d0e-7659-4719-ae72-06237cdec38a' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client '4043186a-89d7-4478-b33d-39387dd75b37' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client '2d647e60-7f44-4a9d-81d0-d7a14e627ead' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client '5069b0bc-f3f0-4f05-9bdc-85b8f1994cae' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client '8f9fe2e4-3a3a-4aff-93c8-d253b2158e71' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client '3433f58e-8481-4cb8-a484-02e511574176' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client '82b4c9ac-8da4-4db4-9da2-ce696f1a844d' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client '89e8c0c6-a5ba-46f3-9aff-f555085d3693' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client 'b1a9dc74-98e3-465c-a885-d06596c1752f' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client 'd2bba957-f1b6-4001-baa1-69730ac6406c' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client 'ca00bda8-3eb3-4868-a041-d0a75092f789' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client '0f2b313d-d49b-4e87-90fc-6e919f1dd67b' requested u'registration_request'
[IPClusterStart] [IPControllerApp] client::client '637903a9-4d7a-4360-9d6d-e05a11a06f73' requested u'registration_request'
[IPClusterStart] 255
[IPClusterStart] Warning: Identity file ~/.ssh/ip_dsa.pub not accessible: No such file or directory.

[IPClusterStart] ssh_exchange_identification: Connection closed by remote host


[IPClusterStart] 
[IPClusterStart] <class 'IPython.external.pexpect._pexpect.EOF'>
[IPClusterStart] ERROR:root:Uncaught exception, closing connection.
[IPClusterStart] Traceback (most recent call last):
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 355, in _run_callback
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 134, in <lambda>
[IPClusterStart]     self.registrar.on_recv(lambda msg: self.complete_registration(msg, connect, maybe_tunnel))
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 154, in complete_registration
[IPClusterStart]     hb_addrs = [ maybe_tunnel(addr) for addr in hb_addrs ]
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 116, in maybe_tunnel
[IPClusterStart]     password=password,
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 139, in open_tunnel
[IPClusterStart]     tunnel = tunnelf(lport, rport, server, remoteip=ip, keyfile=keyfile, password=password)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 194, in openssh_tunnel
[IPClusterStart]     raise RuntimeError("tunnel '%s' failed to start"%(cmd))
[IPClusterStart] RuntimeError: tunnel 'ssh -i ~/.ssh/ip_dsa.pub -f -L 127.0.0.1:49092:127.0.0.1:50535 dwarf20 sleep 15' failed to start
[IPClusterStart] ERROR:root:Uncaught exception, closing connection.
[IPClusterStart] Traceback (most recent call last):
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 381, in _handle_events
[IPClusterStart]     self._handle_recv()
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 421, in _handle_recv
[IPClusterStart]     self._run_callback(callback, msg)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 355, in _run_callback
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 134, in <lambda>
[IPClusterStart]     self.registrar.on_recv(lambda msg: self.complete_registration(msg, connect, maybe_tunnel))
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 154, in complete_registration
[IPClusterStart]     hb_addrs = [ maybe_tunnel(addr) for addr in hb_addrs ]
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 116, in maybe_tunnel
[IPClusterStart]     password=password,
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 139, in open_tunnel
[IPClusterStart]     tunnel = tunnelf(lport, rport, server, remoteip=ip, keyfile=keyfile, password=password)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 194, in openssh_tunnel
[IPClusterStart]     raise RuntimeError("tunnel '%s' failed to start"%(cmd))
[IPClusterStart] RuntimeError: tunnel 'ssh -i ~/.ssh/ip_dsa.pub -f -L 127.0.0.1:49092:127.0.0.1:50535 dwarf20 sleep 15' failed to start
[IPClusterStart] ERROR:root:Exception in I/O handler for fd <zmq.core.socket.Socket object at 0x156de68>
[IPClusterStart] Traceback (most recent call last):
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 288, in start
[IPClusterStart]     self._handlers[fd](fd, events)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 381, in _handle_events
[IPClusterStart]     self._handle_recv()
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 421, in _handle_recv
[IPClusterStart]     self._run_callback(callback, msg)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 355, in _run_callback
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 134, in <lambda>
[IPClusterStart]     self.registrar.on_recv(lambda msg: self.complete_registration(msg, connect, maybe_tunnel))
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 154, in complete_registration
[IPClusterStart]     hb_addrs = [ maybe_tunnel(addr) for addr in hb_addrs ]
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 116, in maybe_tunnel
[IPClusterStart]     password=password,
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 139, in open_tunnel
[IPClusterStart]     tunnel = tunnelf(lport, rport, server, remoteip=ip, keyfile=keyfile, password=password)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 194, in openssh_tunnel
[IPClusterStart]     raise RuntimeError("tunnel '%s' failed to start"%(cmd))
[IPClusterStart] RuntimeError: tunnel 'ssh -i ~/.ssh/ip_dsa.pub -f -L 127.0.0.1:49092:127.0.0.1:50535 dwarf20 sleep 15' failed to start
[IPClusterStart] [IPEngineApp] Registration timed out after 2.0 seconds
[IPClusterStart] CRITICAL:IPEngineApp:Registration timed out after 2.0 seconds
[IPClusterStart] ERROR:root:Error in periodic callback
[IPClusterStart] Traceback (most recent call last):
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 429, in _run
[IPClusterStart]     self.callback()
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 218, in abort
[IPClusterStart]     self.session.send(self.registrar, "unregistration_request", content=dict(id=self.id))
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/zmq/session.py", line 484, in send
[IPClusterStart]     raise TypeError("stream must be Socket or ZMQStream, not %r"%type(stream))
[IPClusterStart] TypeError: stream must be Socket or ZMQStream, not <type 'NoneType'>
[IPClusterStart] [IPControllerApp] registration::finished registering engine 0:'c5212881-9204-4968-9f9e-aedd8364db5b'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 0
[IPClusterStart] 255
[IPClusterStart] Warning: Identity file ~/.ssh/ip_dsa.pub not accessible: No such file or directory.

[IPClusterStart] ssh_exchange_identification: Connection closed by remote host


[IPClusterStart] 
[IPClusterStart] <class 'IPython.external.pexpect._pexpect.EOF'>
[IPClusterStart] ERROR:root:Uncaught exception, closing connection.
[IPClusterStart] Traceback (most recent call last):
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 355, in _run_callback
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 134, in <lambda>
[IPClusterStart]     self.registrar.on_recv(lambda msg: self.complete_registration(msg, connect, maybe_tunnel))
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 185, in complete_registration
[IPClusterStart]     connect(control_stream, control_addr)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 104, in connect
[IPClusterStart]     password=password,
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 113, in tunnel_connection
[IPClusterStart]     new_url, tunnel = open_tunnel(addr, server, keyfile=keyfile, password=password, paramiko=paramiko)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 139, in open_tunnel
[IPClusterStart]     tunnel = tunnelf(lport, rport, server, remoteip=ip, keyfile=keyfile, password=password)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 194, in openssh_tunnel
[IPClusterStart]     raise RuntimeError("tunnel '%s' failed to start"%(cmd))
[IPClusterStart] RuntimeError: tunnel 'ssh -i ~/.ssh/ip_dsa.pub -f -L 127.0.0.1:36302:127.0.0.1:32905 dwarf20 sleep 15' failed to start
[IPClusterStart] ERROR:root:Uncaught exception, closing connection.
[IPClusterStart] Traceback (most recent call last):
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 381, in _handle_events
[IPClusterStart]     self._handle_recv()
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 421, in _handle_recv
[IPClusterStart]     self._run_callback(callback, msg)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 355, in _run_callback
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 134, in <lambda>
[IPClusterStart]     self.registrar.on_recv(lambda msg: self.complete_registration(msg, connect, maybe_tunnel))
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 185, in complete_registration
[IPClusterStart]     connect(control_stream, control_addr)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 104, in connect
[IPClusterStart]     password=password,
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 113, in tunnel_connection
[IPClusterStart]     new_url, tunnel = open_tunnel(addr, server, keyfile=keyfile, password=password, paramiko=paramiko)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 139, in open_tunnel
[IPClusterStart]     tunnel = tunnelf(lport, rport, server, remoteip=ip, keyfile=keyfile, password=password)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 194, in openssh_tunnel
[IPClusterStart]     raise RuntimeError("tunnel '%s' failed to start"%(cmd))
[IPClusterStart] RuntimeError: tunnel 'ssh -i ~/.ssh/ip_dsa.pub -f -L 127.0.0.1:36302:127.0.0.1:32905 dwarf20 sleep 15' failed to start
[IPClusterStart] ERROR:root:Exception in I/O handler for fd <zmq.core.socket.Socket object at 0x156de68>
[IPClusterStart] Traceback (most recent call last):
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 288, in start
[IPClusterStart]     self._handlers[fd](fd, events)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 381, in _handle_events
[IPClusterStart]     self._handle_recv()
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 421, in _handle_recv
[IPClusterStart]     self._run_callback(callback, msg)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 355, in _run_callback
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/zmq/eventloop/stack_context.py", line 133, in wrapped
[IPClusterStart]     callback(*args, **kwargs)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 134, in <lambda>
[IPClusterStart]     self.registrar.on_recv(lambda msg: self.complete_registration(msg, connect, maybe_tunnel))
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 185, in complete_registration
[IPClusterStart]     connect(control_stream, control_addr)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/parallel/engine/engine.py", line 104, in connect
[IPClusterStart]     password=password,
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 113, in tunnel_connection
[IPClusterStart]     new_url, tunnel = open_tunnel(addr, server, keyfile=keyfile, password=password, paramiko=paramiko)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 139, in open_tunnel
[IPClusterStart]     tunnel = tunnelf(lport, rport, server, remoteip=ip, keyfile=keyfile, password=password)
[IPClusterStart]   File "/astro/data/mjung/src/epd-7.1-2-rh5-x86_64/lib/python2.7/site-packages/IPython/external/ssh/tunnel.py", line 194, in openssh_tunnel
[IPClusterStart]     raise RuntimeError("tunnel '%s' failed to start"%(cmd))
[IPClusterStart] RuntimeError: tunnel 'ssh -i ~/.ssh/ip_dsa.pub -f -L 127.0.0.1:36302:127.0.0.1:32905 dwarf20 sleep 15' failed to start
[IPClusterStart] [IPEngineApp] Completed registration with id 0
[IPClusterStart] [IPEngineApp] Completed registration with id 1
[IPClusterStart] [IPEngineApp] Completed registration with id 8
[IPClusterStart] [IPEngineApp] Completed registration with id 9
[IPClusterStart] [IPEngineApp] Completed registration with id 2
[IPClusterStart] [IPEngineApp] Completed registration with id 3
[IPClusterStart] [IPEngineApp] Completed registration with id 4
[IPClusterStart] [IPEngineApp] Completed registration with id 10
[IPClusterStart] [IPEngineApp] Completed registration with id 13
[IPClusterStart] [IPEngineApp] Completed registration with id 11
[IPClusterStart] [IPEngineApp] Completed registration with id 5
[IPClusterStart] [IPEngineApp] Completed registration with id 12
[IPClusterStart] [IPControllerApp] registration::finished registering engine 3:'4043186a-89d7-4478-b33d-39387dd75b37'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 3
[IPClusterStart] [IPControllerApp] registration::finished registering engine 7:'3433f58e-8481-4cb8-a484-02e511574176'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 7
[IPClusterStart] [IPControllerApp] registration::finished registering engine 10:'b1a9dc74-98e3-465c-a885-d06596c1752f'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 10
[IPClusterStart] [IPControllerApp] registration::finished registering engine 11:'d2bba957-f1b6-4001-baa1-69730ac6406c'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 11
[IPClusterStart] [IPControllerApp] registration::finished registering engine 14:'637903a9-4d7a-4360-9d6d-e05a11a06f73'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 14
[IPClusterStart] [IPControllerApp] registration::finished registering engine 8:'82b4c9ac-8da4-4db4-9da2-ce696f1a844d'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 8
[IPClusterStart] [IPControllerApp] registration::finished registering engine 1:'bdb17be7-934c-4310-bf27-9ad3e4f63389'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 1
[IPClusterStart] [IPControllerApp] registration::finished registering engine 2:'e9fe9d0e-7659-4719-ae72-06237cdec38a'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 2
[IPClusterStart] [IPControllerApp] registration::finished registering engine 5:'5069b0bc-f3f0-4f05-9bdc-85b8f1994cae'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 5
[IPClusterStart] [IPControllerApp] registration::finished registering engine 12:'ca00bda8-3eb3-4868-a041-d0a75092f789'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 12
[IPClusterStart] [IPControllerApp] registration::finished registering engine 4:'2d647e60-7f44-4a9d-81d0-d7a14e627ead'
[IPClusterStart] [IPEngineApp] Completed registration with id 14
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 4
[IPClusterStart] [IPControllerApp] registration::finished registering engine 13:'0f2b313d-d49b-4e87-90fc-6e919f1dd67b'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 13
[IPClusterStart] [IPControllerApp] registration::finished registering engine 9:'89e8c0c6-a5ba-46f3-9aff-f555085d3693'
[IPClusterStart] [IPControllerApp] engine::Engine Connected: 9
[IPClusterStart] [IPControllerApp] registration::purging stalled registration: 6
[IPClusterStart] IPython cluster: stopping
[IPClusterStart] Stopping Engines...
[IPClusterStart] CRITICAL:root:Got signal 2, terminating children...
[IPClusterStart] Process 'ssh' stopped: {'pid': 12252, 'exit_code': 255}
[IPClusterStart] Killed by signal 2.
[IPClusterStart] Process 'ssh' stopped: {'pid': 12255, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12256, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12257, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12258, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12251, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12253, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12254, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12242, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12243, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12245, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12246, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12247, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12248, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12249, 'exit_code': 255}
[IPClusterStart] Process 'ssh' stopped: {'pid': 12250, 'exit_code': 255}
[IPClusterStart] Process 'engine set' stopped: {'pluto8': {'pid': 12251, 'exit_code': 255}, 'pluto9': {'pid': 12252, 'exit_code': 255}, 'pluto0': {'pid': 12242, 'exit_code': 255}, 'pluto1': {'pid': 12243, 'exit_code': 255}, 'pluto2': {'pid': 12245, 'exit_code': 255}, 'pluto3': {'pid': 12246, 'exit_code': 255}, 'pluto4': {'pid': 12247, 'exit_code': 255}, 'pluto5': {'pid': 12248, 'exit_code': 255}, 'pluto6': {'pid': 12249, 'exit_code': 255}, 'pluto7': {'pid': 12250, 'exit_code': 255}, 'pluto12': {'pid': 12255, 'exit_code': 255}, 'pluto13': {'pid': 12256, 'exit_code': 255}, 'pluto10': {'pid': 12253, 'exit_code': 255}, 'pluto11': {'pid': 12254, 'exit_code': 255}, 'pluto14': {'pid': 12257, 'exit_code': 255}, 'pluto15': {'pid': 12258, 'exit_code': 255}}
[IPClusterStart] CRITICAL:root:Got signal 2, terminating children...
[IPClusterStart] Process '/home/mjung/src/epd-7.1-2-rh5-x86_64/bin/python' stopped: {'pid': 12223, 'exit_code': 0}
[IPClusterStart] Removing pid file: /home/mjung/.config/ipython/profile_ssh/pid/ipcluster.pid


More information about the IPython-User mailing list