[IPython-user] How is a TaskClient "fault tolerant"? And can it play nice with PBS queueing?

Jon Olav Vik jonovik@gmail....
Mon Feb 8 13:11:50 CST 2010


I'm acquainting myself with parallel IPython and have a couple of questions.

1. Could someone please explain what it means that a TaskClient is "fault 
tolerant"?
http://ipython.scipy.org/doc/stable/html/parallel/parallel_task.html

2. The task interface sounds useful for embarrassingly parallel computations. 
I'm trying to follow the instructions at 
http://ipython.scipy.org/doc/stable/html/parallel/parallel_process.html#using-
ipcluster-in-pbs-mode
(PBS is the queueing system used by the computer cluster I'm working with).

I use the command 
ipcluster pbs -n 8 --pbs-script=pbs.template &
to run the following pbs script:

#PBS -N ipython
#PBS -j oe
#PBS -l walltime=00:10:00
#PBS -l nodes=${n/8}:ppn=8
#PBS -q express
cd $$PBS_O_WORKDIR
mpiexec -n ${n} ipengine --logfile=$$PBS_O_WORKDIR/ipengine &
sleep 30
python ipar.py

...where ipar.py starts a MultiEngineClient and execute()'s commands that use 
MPI on the ipengines. (I haven't tried using it with a TaskClient yet.)

Note that I'm starting mpiexec in the background; otherwise, it would never 
finish and my Python script would never get called. Also, I'm backgrounding the 
call to ipcluster because that too never seems to finish. (Using mpiexec with 
"python ipar.py" does not seem to be required.)

However, the compute cluster's user instructions say I shouldn't start 
processes in the background, because then they escape the control of the job 
scheduler. Is there a way I can make TaskClient() work under this restriction? 
Otherwise, I'm just going to manually "killall ipcluster" etc. once my job is 
done. (Or maybe that could go as the last lines of my pbs script?)

I'm a complete newbie in this, so any hints are highly appreciated.

Best regards,
Jon Olav Vik




More information about the IPython-user mailing list