[IPython-User] Thread-safety of IPython.kernel.client

Søren Gammelmark gammelmark@phys.au...
Thu Aug 19 02:50:24 CDT 2010


Hi everyone

First of all thankyou for an extremely useful tool in the IPython and 
it's ability to help with cluster computing!

To what degree is IPython.kernel.client thread-safe? (i.e. safe in the 
sense of the threading-module). I have a problem when I run several 
threads each of which are sending commands to individual ipengine's from 
a Queue.Queue. It seems like one of the engines is getting the same 
commands twice: From the log I have something like this for ipengine id 2

2010-08-18 19:24:36+0200 [-] Performing reset on 2
2010-08-18 19:24:36+0200 [-] Performing reset on 2
2010-08-18 19:24:36+0200 [-] Performing push on 2
2010-08-18 19:24:36+0200 [-] Performing push on 2
2010-08-18 19:24:36+0200 [-] Performing execute on 2
2010-08-18 19:24:36+0200 [-] Performing execute on 2

Where the other engines go through a single reset-push-execute cycle 
(which is consistent with my program).

2010-08-18 19:31:19+0200 [-] Performing reset on 3
2010-08-18 19:31:19+0200 [-] Performing push on 3
2010-08-18 19:31:19+0200 [-] Performing execute on 3

I suspect that this messes up the pull I have to do later (if I reset 
before the pull, I cannot get the stuff back). Another, and possibly 
related issue is a QueueCleared exception. The funny thing in these 
cases is that the system complains about an exception from e.g. 'push' 
in the QueueCleared is from the 'pull' (and similar pull/execute):

one or more exceptions from call to method: push
[Engine Exception]QueueCleared: 'pull' ('filename',) {}
[Engine Exception]

...

one or more exceptions from call to method: pull
[Engine Exception]QueueCleared: 'execute' ('filename = task.run()',) {}
[Engine Exception]
No traceback available

Does this make any sense or do you need more information? For the 
record, the problem only arises when running on multiple nodes. I have 
tested the programs on my own machine (with multiple cores and 
ipengines), where it seems to work without problems. The problems also 
only happens three times during a 12 hour run (and late in the run at 
that), so it is not very systematic. Therefore I have no idea where to 
start investigating this.

Hope you can help
Søren Gammelmark

P.S: If you are wondering why I do not use that TaskClient it is because 
I would like to do extra postprocessing on the results after the tasks 
are finished, i.e. transferring data between files and networks. If you 
know of an obvious way to do this with less chance of error I would be 
quite interested.


More information about the IPython-User mailing list