[IPython-User] notebook and parallel work on Windows7 and Linux
Wed Jan 30 23:35:50 CST 2013
On Wed, Jan 30, 2013 at 4:43 AM, Fritz Payr
> I am trying to run an IPython notebook and (with it) use a Linux-cluster
> for long-running parallel work.
> My PC runs Windows7, and the network connecting it with the cluster
> tends to shut down ssh-connections from time to time. I've tried running
> the notebook on the cluster (with a ssh-tunnel bringing it to a browser
> on my PC) and on my PC (with a ipcontroller running on the cluster,
> connected via the built-in ssh).
First, let me express my gratitude that you are giving the Windows SSH code
an exercise (you may be the first).
I'm glad it at least sort-of works.
> Both setups have problems:
> If I have the notebook running on the cluster ssh-tunneling to a browser
> on my PC I don't know how to "reconnect" the browser to the "old
> session" when the tunnel breaks down.
This is probably easier. If this connection is lost, you can just open a
ssh -f -N -L 127.0.0.1:LPORT:HOST:RPORT SERVER
(or whatever the Windows equivalent). Precisely the same call as you used
to start the first tunnel.
The Notebook may lose *websocket* connections (execution, output) when the
tunnel goes down,
but you should at least be able to do things like save the notebook after
re-establishing the tunnel
and then refresh the page to get new websocket connections.
This is one of the things that is probably better in current master than
0.13.1 - websockets will reconnect automatically,
and if the reconnect fails you get a dialog that lets you reconnect
manually. This makes it harder to lose websocket connections
in a way that forces a page refresh.
> If the notebook is running on the PC ssh-ing with the ipcontroller on
> the cluster, it seems they silently "loose contact" during long tasks.
> Then I can see the cluster finish its tasks (all 80 x 16 hours!), but
> the notebook remains "busy" and doesn't receive anything. - No idea how
> I could retrieve the results from the ipcontroller, the engines, or
> anyone else!
The Hub can remember all tasks, so you can actually request past results by
with calls like `client.get_result(msg_id)`. So you can do things like
submit a bunch of tasks that will take days,
then in a later, totally different session, retrieve the results.
> Is it bad to have tasks that run for so long?
Not generally, but it may be if you have intermittent connection issues.
> Or is there a standard way to distribute notebook and cluster on an
> "unstable network" without these problems? - Surely it doesn't make
> sense to run a browser on the cluster?!
No, there is no general solution to unreliable networks.
The real solution is to fix whatever erroneous configuration is killing
If you don't trust your connection, the best workaround is what I alluded
1. submit tasks in a single session
2. record the msg_ids associated with that work in some persistent way
(files, in the notebook itself, etc.)
3. retrieve results from the Hub via msg_ids (or filesystem, or anything
If you write your code such that submitting work and retrieving results are
totally disconnected steps,
then connections lost in between steps 2 and 3 should be less problematic.
> A smaller issue I noticed in the second configuration is that while
> IPython's paramiko can work nicely with PuTTY (rather: Pageant) to find
> private ssh keys, it doesn't recognize PuTTY's "known hosts" or remember
> its own. (Apparently PuTTY stores them in the registry, but paramiko
> would like them saved to a file by the application?) So I keep getting
> warnings about an unknown host (first time in the browser-notebook,
> later in its cmd-output) although it should be well-known by now.
That sounds like an issue for Paramiko. I don't know of anything IPython
should be doing here.
> IPython-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User