[IPython-User] Getting setup on a remote cluster w/ Sun Grid Engine.

Ariel Rokem arokem@gmail....
Mon Nov 14 21:10:24 CST 2011


Hi everyone,

Following up on this thread, I am trying to get this working on the SGE on
our local cluster (thankfully, everyone is away at a conference, so I have
the cluster pretty much to myself. Good week for experimenting...).

I updated my fork from ipython/master this afternoon and followed the
instructions below. I am getting the following behavior:

celadon:~  $ipcluster start --n=10 --profile=sge
[IPClusterStart] Using existing profile dir:
u'/home/arokem/.config/ipython/profile_sge'
[IPClusterStart] Starting ipcluster with [daemon=False]
[IPClusterStart] Creating pid file:
/home/arokem/.config/ipython/profile_sge/pid/ipcluster.pid
[IPClusterStart] Starting PBSControllerLauncher: ['qsub',
u'./sge_controller']
[IPClusterStart] adding job array settings to batch script
ERROR:root:Error in periodic callback
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line
423, in _run
    self.callback()
  File
"/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/ipclusterapp.py",
line 497, in start_controller
    self.controller_launcher.start()
  File
"/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py",
line 1022, in start
    return super(SGEControllerLauncher, self).start(1)
  File
"/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py",
line 936, in start
    self.write_batch_script(n)
  File
"/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py",
line 925, in write_batch_script
    script_as_string = self.formatter.format(self.batch_template,
**self.context)
  File "/usr/lib64/python2.7/string.py", line 545, in format
    return self.vformat(format_string, args, kwargs)
  File "/usr/lib64/python2.7/string.py", line 549, in vformat
    result = self._vformat(format_string, args, kwargs, used_args, 2)
  File
"/home/arokem/usr/local/lib/python2.7/site-packages/IPython/utils/text.py",
line 652, in _vformat
    obj = eval(field_name, kwargs)
  File "<string>", line 1, in <module>
NameError: name 'n' is not defined
[IPClusterStart] Starting 10 engines
[IPClusterStart] Starting 10 engines with SGEEngineSetLauncher: ['qsub',
u'./sge_engines']
[IPClusterStart] adding job array settings to batch script
[IPClusterStart] Writing instantiated batch script: ./sge_engines
[IPClusterStart] Job submitted with job id: '430658'
[IPClusterStart] Process 'qsub' started: '430658'
[IPClusterStart] Engines appear to have started successfully

It looks like something goes wrong (the NameError), but then the jobs get
submitted and for a brief time, qmon does acknowledge the existence of a
list of jobs with that id, but then it disappears (somehow gets deleted?)
from qmon almost immediately and when I try to initialize a parallel.Client
with the "sge" profile in an ipython session, I get a "TimeoutError: Hub
connection request timed out". I also tried initializing ipcluster with the
default profile and run some computations and I am getting the
approximately 7-fold expected speed-up (on an 8 core machine), so some
things do work. Does anyone have any idea what is going wrong with the SGE?

Thanks,

Ariel




On Wed, Aug 24, 2011 at 3:07 PM, MinRK <benjaminrk@gmail.com> wrote:

> On Wed, Aug 24, 2011 at 15:05, Dharhas Pothina
> <Dharhas.Pothina@twdb.state.tx.us> wrote:
> >
> > I was able to start the engines and they were submitted to the queue
> > properly but I do not have a json file in the corresponding security
> folder.
> > Do I need to do something to generate it.
>
> The JSON file is written by ipcontroller, so it will only show up
> after the controller has started.
>
> >
> > - dharhas
> >
> >>>> MinRK <benjaminrk@gmail.com> 8/24/2011 4:44 PM >>>
> > On a login node on the cluster:
> >
> > # create profile with default parallel config files, called sge
> > [login] $> ipython profile create sge --parallel
> >
> > edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:
> >
> > c.HubFactory.ip = '0.0.0.0'
> >
> > to instruct the controller to listen on all interfaces.
> >
> > Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:
> >
> > c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
> > c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'
> >
> > # optional: specify a queue for all:
> > c.SGELauncher.queue = 'short'
> > To instruct ipcluster to use SGE to launch the engines and the controller
> >
> > At this point, you can start 10 engines and a controller with:
> >
> > [login] $> ipcluster start -n 10 --profile=sge
> >
> > Now the only file you will need to connect to the cluster will be in:
> >
> > IPYTHON_DIR/profile_sge/security/ipcontroller_client.json
> >
> > Just move that file around, and you will be able to connect clients.
> > To connect from a laptop, you will probably need to specify a login
> > node as the ssh server when you do:
> >
> > from IPython import parallel
> >
> > rc = parallel.Client('/path/to/ipcontroller_client.json',
> > sshserver='you@login.mycluster.etc')
> >
> > -MinRK
> >
> >
> > On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina
> > <Dharhas.Pothina@twdb.state.tx.us> wrote:
> >> Hi All,
> >>
> >> We have managed to parallelize one of our spatial interpolation scripts
> >> very
> >> easily with the new ipython parallel. Thanks for developing such a great
> >> tool, it was fairly easy to get working. Now we are trying to set things
> >> up
> >> to run on our internal cluster and I'm having difficulties understanding
> >> how
> >> to configure things.
> >>
> >> What I would like to do is have ipython running on a local machine
> >> (windows
> >> & linux) connect to the cluster, request some nodes through SGE and run
> >> the
> >> computation. I'm not quite getting what goes where from the
> documentation.
> >>
> >> I think I understood the PBS example but I'm still not understanding
> where
> >> I
> >> would put the connection information to log into the cluster. I would
> >> really
> >> appreciate a step by step of what files need to be where and any example
> >> config files for an SGE setup.
> >>
> >> thanks,
> >>
> >> - dharhas
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> IPython-User mailing list
> >> IPython-User@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/ipython-user
> >>
> >>
> > _______________________________________________
> > IPython-User mailing list
> > IPython-User@scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-user
> >
> > _______________________________________________
> > IPython-User mailing list
> > IPython-User@scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-user
> >
> >
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20111114/0e1a6395/attachment.html 


More information about the IPython-User mailing list