[IPython-User] ipython cluster engines and fd limit
Tue Sep 11 10:27:34 CDT 2012
recently I experienced some problems running an ipython cluster (ipython version
0.13) that might be related to the maximum number of file descriptors, as
However, I do not get any specific error messages that would confirm that, and
hence I'd like to explain the problem here a bit.
The limit of open files for a process on our machines is 1024 (and we cannot
change this). I start the ipython cluster by manually starting the ipcontroller
and the ipengine's (i.e. without using ipcluster) with the options "--debug
>From the discussion in the thread linked above, it appears that an engine
occupies 3 connections + some engine-independent number of connections. Hence,
250 engines seemed to be a safe limit for our machines.
However, we observe that often not all engines connect to the controller, i.e.
Client.ids contains only a subset (on the other hand sometimes it at least seems
as if all engines connect successfully, at least they all show up in
Client.ids). When looking at the log files, no errors are seen in the
ipengine-*.log files (They all end with "[IPEngineApp] Completed registration
with id xxx"). In the ipcontroller-*.log I see for all engines events of the
type "[IPControllerApp] registration::register_engine(...)", but only for the
subset that apears in Client.ids I also see "[IPControllerApp]
registration::finished registering engine ...".
If I increase the number of engines to 300, I also start to see for some engines
registration timeouts, even if the timeout is 100 seconds.
My feeling is that this behavior could very well be explained by the limit of
open files (that engines might be able to open a port for registration, but not
for the other stuff). However, I do not get any explicit error message "Too many
open files ..." as described in the thread linked above, which does puzzle me.
Any ideas? Can I further increase the debug output?
More information about the IPython-User