[IPython-User] IPython Parallel on SGE (or PBS) and on-demand scheduling

Martijn Vermaat martijn@vermaat.n...
Wed Feb 20 04:53:43 CST 2013


Dear list,

(Excuse me if this message is received twice, my first attempt before
subscribing to the list didn't seem to get through.)

Today I managed to get IPython Parallel working on our cluster using
Sun Grid Engine (SGE, or PBS-like) scheduling of controller and
engines.

This setup first schedules the controller and then the engines, using
the SGE -t option (a job array). When this is all running, I can make
use of the engines from my code. This is great!

However, it doesn't fit really well in the way we use our cluster (and
how many clusters are typically used I think), in that this is
essentially pre-allocation of compute slots. Also, computation can
only start when all engines are running (which can take quite a while
if the cluster is busy, and I usually don't know how many slots I can
get hold of simultaneously). And untill my very last computation is
done, I keep taking all these slots. I'm not the only user of the
cluster.

What I would like to see is some sort of dynamic scheduling of engines
(with some defined maximum simultaneous number) where they are also
stopped when there is not much work to do.

This could be implemented, I think, by having SGEEngineSetLauncher
schedule individual engines as separate SGE jobs, instead of
scheduling one big fixed job array. Dynamic scaling of the engine pool
would require some additional complexity.

Without having much experience with IPython Parallel, I have the
feeling this doesn't really fit its parellalism model (the pool of
engines is fixed?). But I hope I'm wrong.

Are there other users with similar use cases?

cheers,
Martijn


More information about the IPython-User mailing list