[IPython-User] Thread-safety of IPython.kernel.client

Brian Granger ellisonbg@gmail....
Mon Aug 23 22:59:51 CDT 2010


On Thu, Aug 19, 2010 at 12:50 AM, Søren Gammelmark
<gammelmark@phys.au.dk> wrote:
> Hi everyone
>
> First of all thankyou for an extremely useful tool in the IPython and
> it's ability to help with cluster computing!
>
> To what degree is IPython.kernel.client thread-safe? (i.e. safe in the
> sense of the threading-module). I have a problem when I run several
> threads each of which are sending commands to individual ipengine's from
> a Queue.Queue. It seems like one of the engines is getting the same
> commands twice: From the log I have something like this for ipengine id 2

The client is not thread safe in any way at all.

> 2010-08-18 19:24:36+0200 [-] Performing reset on 2
> 2010-08-18 19:24:36+0200 [-] Performing reset on 2
> 2010-08-18 19:24:36+0200 [-] Performing push on 2
> 2010-08-18 19:24:36+0200 [-] Performing push on 2
> 2010-08-18 19:24:36+0200 [-] Performing execute on 2
> 2010-08-18 19:24:36+0200 [-] Performing execute on 2
>
> Where the other engines go through a single reset-push-execute cycle
> (which is consistent with my program).
>
> 2010-08-18 19:31:19+0200 [-] Performing reset on 3
> 2010-08-18 19:31:19+0200 [-] Performing push on 3
> 2010-08-18 19:31:19+0200 [-] Performing execute on 3
>
> I suspect that this messes up the pull I have to do later (if I reset
> before the pull, I cannot get the stuff back). Another, and possibly
> related issue is a QueueCleared exception. The funny thing in these
> cases is that the system complains about an exception from e.g. 'push'
> in the QueueCleared is from the 'pull' (and similar pull/execute):
>
> one or more exceptions from call to method: push
> [Engine Exception]QueueCleared: 'pull' ('filename',) {}
> [Engine Exception]
>
> ...
>
> one or more exceptions from call to method: pull
> [Engine Exception]QueueCleared: 'execute' ('filename = task.run()',) {}
> [Engine Exception]
> No traceback available
>
> Does this make any sense or do you need more information? For the
> record, the problem only arises when running on multiple nodes. I have
> tested the programs on my own machine (with multiple cores and
> ipengines), where it seems to work without problems. The problems also
> only happens three times during a 12 hour run (and late in the run at
> that), so it is not very systematic. Therefore I have no idea where to
> start investigating this.

This sounds like a thread safety issue.  Do you really need to have
multiple threads that use the client like this?  The only way we could
help in a more detailed manner is if you can provide a very simple
example that replicates the problem.  But, with things like threading
problems, that might be difficult.

Brian

> Hope you can help
> Søren Gammelmark
>
> P.S: If you are wondering why I do not use that TaskClient it is because
> I would like to do extra postprocessing on the results after the tasks
> are finished, i.e. transferring data between files and networks. If you
> know of an obvious way to do this with less chance of error I would be
> quite interested.
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>



-- 
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu
ellisonbg@gmail.com


More information about the IPython-User mailing list