[IPython-dev] Ipython parallel and PBS

MinRK benjaminrk@gmail....
Fri Sep 13 13:14:17 CDT 2013


Can you inspect the pbs_engines template, and see if anything looks wrong?
Can you submit it manually, with qsub ./pbs_engines?


On Fri, Sep 13, 2013 at 3:38 AM, James <jamesresearching@gmail.com> wrote:

> Dear all,
>
> I'm having a lot of trouble setting up IPython parallel on a PBS cluster,
> and I would really appreciate any help.
>
> The architecture is a standard PBS cluster - a head node with slave nodes.
> I connect to the head node from my laptop over ssh.
>
> The client (laptop) -> Head node connection seems simple enough. The
> problem is the engines.
>
> Ignoring the laptop for a moment, I'll just focus on running ipython on
> the head node, with the engines on a slave node. I assume this is a correct
> method of working?
>
> I did the following on the head node, following instructions at
> http://ipython.org/ipython-doc/stable/parallel/parallel_process.html#using-ipcluster-in-pbs-mode:
>
> $ ipython profile create --parallel --profile=pbs
>
> Files are as follows:
>
> $cat ipcluster_config.py
> c = get_config()
> c.IPClusterStart.controller_launcher_class = 'PBSControllerLauncher'
> c.IPClusterEngines.engine_launcher_class = 'PBSEngineSetLauncher'
> c.PBSLauncher.queue = 'long'
> c.IPClusterEngines.n = 2 # Run 2 cores on 1 node or 2 nodes with all
> cores? Not sure.
>
> $ cat ipengine_config.py
> c = get_config()
>
> Then execute on the head node:
> $ ipcluster start --profile=pbs -n 2
> 2013-09-10 15:02:46,771.771 [IPClusterStart] Using existing profile dir:
> u'/home/username/.ipython/profile_pbs'
> 2013-09-10 15:02:46.777 [IPClusterStart] Starting ipcluster with
> [daemon=False]
> 2013-09-10 15:02:46.778 [IPClusterStart] Creating pid file:
> /home/username/.ipython/profile_pbs/pid/ipcluster.pid
> 2013-09-10 15:02:46.778 [IPClusterStart] Starting Controller with
> PBSControllerLauncher
> 2013-09-10 15:02:46.792 [IPClusterStart] Job submitted with job id: '2830'
> 2013-09-10 15:02:47.793 [IPClusterStart] Starting 2 Engines with
> PBSEngineSetLauncher
> 2013-09-10 15:02:47.808 [IPClusterStart] Job submitted with job id: '2831'
>
> Then the queue shows
> $ qstat
> Job id                    Name             User            Time Use S Queue
> ------------------------- ---------------- --------------- -------- - -----
> 2830[].master              ipcontroller     username              0 Q
> long
> 2831[].master              ipengine         username              0 Q
> long
>
> And they just hang there, queued forever. I assume the engines at least
> should be running? Full information through "qstat -f" doesn't give the
> reason for the queuing. Normally it would do. There are more than 4 nodes
> available.
>
> $qstat -f
> Job Id: 2831[].master.domain
>     Job_Name = ipengine
>     Job_Owner = username@master.domain
>     job_state = Q
>     queue = long
>     server = [head node's domain address]
>     Checkpoint = u
>     ctime = Tue Sep 10 15:02:47 2013
>     Error_Path = master.domain:/home/username/
> ipengine.e2831
>     Hold_Types = n
>     Join_Path = n
>     Keep_Files = n
>     Mail_Points = a
>     mtime = Tue Sep 10 15:02:47 2013
>     Output_Path = master.domain:/home/username/ipengine.o2831
>     Priority = 0
>     qtime = Tue Sep 10 15:02:47 2013
>     Rerunable = True
>     [...]
>     etime = Tue Sep 10 15:02:47 2013
>     submit_args = ./pbs_engines
>     job_array_request = 1-2
>     fault_tolerant = False
>     submit_host = master.domain
>     init_work_dir = /home/username
>
> It also seems strange that the ipcontroller is launched through PBS. I
> thought this should be on the head node, so I changed
> 'PBSControllerLauncher' to 'LocalControllerLauncher'. Then it doesn't
> queue, but I don't know if what I'm doing is correct.
>
> Any help would be really greatly appreciated.
>
> Thank you.
>
> James
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-dev/attachments/20130913/cd410fa6/attachment-0001.html 


More information about the IPython-dev mailing list