[IPython-User] Large Parallel Runs
Thu Aug 30 17:06:12 CDT 2012
I'm currently having some difficulty with ipcontroller seeming to
choke when given too many tasks on too large a cluster, and was
wondering whether anyone else had experienced this.
I'm using a cluster with the following configuration:
* ipcontroller running on one machine, with 7 ipengines
* ipengines running on 19 other machines with between 2 and 8
instances per machine (1 per core), all connecting to ipcontroller via
ssh. There are 72 ipengines in total.
* the client running on my laptop and connected via ssh. My laptop is
also one of the 19 machines
On this setup, giving 1600 tasks seemed to work relatively well.
However, giving it 16000 of the same tasks doesn't seem to be working.
With perhaps only a few hundred tasks completed, queue_status() is
only telling me about 4500 tasks unassigned at this point, at least
twenty minutes after starting. Tasks are completing very slowly, and
most of the ipengines seem to be idle.
The only thing using significant CPU is ipcontroller, which is taking
up 100% of its core. It doesn't seem to be using significant memory,
Has anyone else run into limitations like these? Is there some way
around them? Do I simply have a bad configuration, or is there
something more fundamental that might be wrong here?
More information about the IPython-User