[IPython-User] ipcluster: Too many open files (tcp_listener.cpp:213)
Jon Olav Vik
jonovik@gmail....
Tue Jun 19 03:06:37 CDT 2012
Fernando Perez <fperez.net <at> gmail.com> writes:
> On Tue, Jun 19, 2012 at 12:39 AM, MinRK <benjaminrk <at> gmail.com> wrote:
> >
> > This happens at the zeromq level - IPython has no way of controlling this.
>
> Question, what's the number of open fds per engine right now (plus any
> others opened by the hub)?
My guess is four per engine + some for the hub etc. Repeated tests yesterday
gave me 240 engines from 10 nodes with 24 processors each, whereas the 11th
seemed to break the controller's back, consistent with `ulimit -n` / 4 = 256.
> At least if we advertise what the formula
> is, users could tweak their job submission scripts to test their
> environment and only request a total number of engines that's safe
> given the constraints they find...
Googling around, I see that some batch schedulers have constraints on the
number of concurrent CPUs per users, which would be a perfect fit here.
(My problem is that my scheduled jobs quickly burn out when engines cannot
connect, and I need to wait until some connections have died, then schedule
more. I occurs to me now that I could perhaps have my main loop check len
(c.ids) and release batch queue holds once there are vacancies.)
A good workaround might be to have multiple ipclusters running:
jsonfiles = [...]
c = [Client(i) for i in jsonfiles]
lv = [i.load_balanced_view() for i in c]
def do(workpiece):
pass
pdo = [i.parallel(ordered=False, retries=10)(do) for i in lv]
# Insert clever coordination of multiple clusters here...
# Simple example:
workpieces = ...
workpieces = np.array_split(workpieces, len(c))
async = [i.map(j) for i, j in zip(pdo, workpieces)]
Engine nodes could spread across clusters according to their CPU number or
something.
More information about the IPython-User
mailing list