[IPython-user] How is a TaskClient "fault tolerant"? And can it play nice with PBS queueing?
Jon Olav Vik
jonovik@gmail....
Mon Feb 8 13:11:50 CST 2010
I'm acquainting myself with parallel IPython and have a couple of questions.
1. Could someone please explain what it means that a TaskClient is "fault
tolerant"?
http://ipython.scipy.org/doc/stable/html/parallel/parallel_task.html
2. The task interface sounds useful for embarrassingly parallel computations.
I'm trying to follow the instructions at
http://ipython.scipy.org/doc/stable/html/parallel/parallel_process.html#using-
ipcluster-in-pbs-mode
(PBS is the queueing system used by the computer cluster I'm working with).
I use the command
ipcluster pbs -n 8 --pbs-script=pbs.template &
to run the following pbs script:
#PBS -N ipython
#PBS -j oe
#PBS -l walltime=00:10:00
#PBS -l nodes=${n/8}:ppn=8
#PBS -q express
cd $$PBS_O_WORKDIR
mpiexec -n ${n} ipengine --logfile=$$PBS_O_WORKDIR/ipengine &
sleep 30
python ipar.py
...where ipar.py starts a MultiEngineClient and execute()'s commands that use
MPI on the ipengines. (I haven't tried using it with a TaskClient yet.)
Note that I'm starting mpiexec in the background; otherwise, it would never
finish and my Python script would never get called. Also, I'm backgrounding the
call to ipcluster because that too never seems to finish. (Using mpiexec with
"python ipar.py" does not seem to be required.)
However, the compute cluster's user instructions say I shouldn't start
processes in the background, because then they escape the control of the job
scheduler. Is there a way I can make TaskClient() work under this restriction?
Otherwise, I'm just going to manually "killall ipcluster" etc. once my job is
done. (Or maybe that could go as the last lines of my pbs script?)
I'm a complete newbie in this, so any hints are highly appreciated.
Best regards,
Jon Olav Vik
More information about the IPython-user
mailing list