[IPython-dev] pyzmq problems in sending shell messages to a kernel

Jason Grout jason-sage@creativetrax....
Wed Feb 12 12:45:32 CST 2014


Hi everyone,

I'm trying to track down a problem we're seeing in the Sage cell server 
with sending computation messages to an IPython kernel.  This may end up 
being a problem with using pyzmq or zmq, so apologies in advance if it 
turns out to be OT for this list.

The tl;dr version is: it appears that in some very sporadic cases, pyzmq 
is sending a message (an execute_request message) to a kernel's shell 
channel tcp port on localhost, but wireshark never registers that 
message being sent, and the kernel that is supposed to receive the 
message never acts on it.  My question is: does anyone have suggestions 
on debugging this or narrowing down the problem?

The (abbreviated, simplified) long version: in the sage cell server, we 
start up a number of IPython kernels that we keep waiting around for 
computations.  When a computation is requested, we hook up the kernel's 
shell/iopub/heartbeat channels (i.e., create pyzmq zmqstream objects 
connecting to the tcp ports corresponding to the kernel's 
shell/io/heartbeat channels), send an execute_request, and assemble an 
answer for the user from output coming back on the iopub channel.  When 
the system is under moderate load, every now and then (maybe every 300 
computations), we send an execute_request message to one of these 
kernels that is waiting around, and I see the zmq socket code claiming 
that it sent the message, but wireshark indicates that the message was 
never transmitted when looking at raw tcp traffic, and the kernel acts 
like it never received the message.  We didn't change the high water 
mark for zmq, and I'm running zmq 3.2.2 and pyzmq 14.0.1.  I've spent a 
long time narrowing the issue down to a zmq message not being sent, even 
though pyzmq seems to have thought it sent it.  Does anyone have any 
suggestions for narrowing this down more, or possible causes?

I realize that my setup is a bit complicated, and I've tried to simplify 
the issues (but hopefully not too much).  Any suggestions or help would 
be appreciated.  The next thing I'm going to do is (a) upgrade zmq to 
4.x, and (b) insert some debugging statements in the zmq library itself 
to see if the C zmq library thinks it sent the message.

Thanks,

Jason


More information about the IPython-dev mailing list