[IPython-User] IPython Parallel on SGE (or PBS) and on-demand scheduling
Tue Feb 19 09:57:13 CST 2013
Today I managed to get IPython Parallel working on our cluster using Sun
Grid Engine (SGE, or PBS-like) scheduling of controller and engines.
This setup first schedules the controller and then the engines, using
the SGE -t option (a job array). When this is all running, I can make
use of the engines from my code. This is great!
However, it doesn't fit really well in the way we use our cluster (and
how many clusters are typically used I think), in that this is
essentially pre-allocation of compute slots. Also, computation can only
start when all engines are running (which can take quite a while if the
cluster is busy, and I usually don't know how many slots I can get hold
of simultaneously). And untill my very last computation is done, I keep
taking all these slots. I'm not the only user of the cluster.
What I would like to see is some sort of dynamic scheduling of engines
(with some defined maximum simultaneous number) where they are also
stopped when there is not much work to do.
This could be implemented, I think, by having SGEEngineSetLauncher
schedule individual engines as separate SGE jobs, instead of scheduling
one big fixed job array. Dynamic scaling of the engine pool would
require some additional complexity.
Without having much experience with IPython Parallel, I have the feeling
this doesn't really fit its parellalism model (the pool of engines is
fixed?). But I hope I'm wrong.
Are there others with similar use cases?
More information about the IPython-User