[IPython-User] Getting setup on a remote cluster w/ Sun Grid Engine.

MinRK benjaminrk@gmail....
Mon Nov 14 21:45:00 CST 2011


On Mon, Nov 14, 2011 at 19:10, Ariel Rokem <arokem@gmail.com> wrote:

> Hi everyone,
>
> Following up on this thread, I am trying to get this working on the SGE on
> our local cluster (thankfully, everyone is away at a conference, so I have
> the cluster pretty much to myself. Good week for experimenting...).
>
> I updated my fork from ipython/master this afternoon and followed the
> instructions below. I am getting the following behavior:
>
> celadon:~  $ipcluster start --n=10 --profile=sge
> [IPClusterStart] Using existing profile dir:
> u'/home/arokem/.config/ipython/profile_sge'
> [IPClusterStart] Starting ipcluster with [daemon=False]
> [IPClusterStart] Creating pid file:
> /home/arokem/.config/ipython/profile_sge/pid/ipcluster.pid
> [IPClusterStart] Starting PBSControllerLauncher: ['qsub',
> u'./sge_controller']
> [IPClusterStart] adding job array settings to batch script
> ERROR:root:Error in periodic callback
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line
> 423, in _run
>     self.callback()
>   File
> "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/ipclusterapp.py",
> line 497, in start_controller
>     self.controller_launcher.start()
>   File
> "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py",
> line 1022, in start
>     return super(SGEControllerLauncher, self).start(1)
>   File
> "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py",
> line 936, in start
>     self.write_batch_script(n)
>   File
> "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py",
> line 925, in write_batch_script
>     script_as_string = self.formatter.format(self.batch_template,
> **self.context)
>   File "/usr/lib64/python2.7/string.py", line 545, in format
>     return self.vformat(format_string, args, kwargs)
>   File "/usr/lib64/python2.7/string.py", line 549, in vformat
>     result = self._vformat(format_string, args, kwargs, used_args, 2)
>   File
> "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/utils/text.py",
> line 652, in _vformat
>     obj = eval(field_name, kwargs)
>   File "<string>", line 1, in <module>
> NameError: name 'n' is not defined
> [IPClusterStart] Starting 10 engines
> [IPClusterStart] Starting 10 engines with SGEEngineSetLauncher: ['qsub',
> u'./sge_engines']
> [IPClusterStart] adding job array settings to batch script
> [IPClusterStart] Writing instantiated batch script: ./sge_engines
> [IPClusterStart] Job submitted with job id: '430658'
> [IPClusterStart] Process 'qsub' started: '430658'
> [IPClusterStart] Engines appear to have started successfully
>
> It looks like something goes wrong (the NameError), but then the jobs get
> submitted and for a brief time, qmon does acknowledge the existence of a
> list of jobs with that id, but then it disappears (somehow gets deleted?)
> from qmon almost immediately and when I try to initialize a parallel.Client
> with the "sge" profile in an ipython session, I get a "TimeoutError: Hub
> connection request timed out". I also tried initializing ipcluster with the
> default profile and run some computations and I am getting the
> approximately 7-fold expected speed-up (on an 8 core machine), so some
> things do work. Does anyone have any idea what is going wrong with the SGE?
>

This is a horrible typo that crept in when I did some reorganization in the
launchers.  Should be fixed in master.

The TimeoutError in the client generally means that the controller isn't
running, or at least isn't where connection files claimed it to be.



>
> Thanks,
>
> Ariel
>
>
>
>
> On Wed, Aug 24, 2011 at 3:07 PM, MinRK <benjaminrk@gmail.com> wrote:
>
>> On Wed, Aug 24, 2011 at 15:05, Dharhas Pothina
>> <Dharhas.Pothina@twdb.state.tx.us> wrote:
>> >
>> > I was able to start the engines and they were submitted to the queue
>> > properly but I do not have a json file in the corresponding security
>> folder.
>> > Do I need to do something to generate it.
>>
>> The JSON file is written by ipcontroller, so it will only show up
>> after the controller has started.
>>
>> >
>> > - dharhas
>> >
>> >>>> MinRK <benjaminrk@gmail.com> 8/24/2011 4:44 PM >>>
>> > On a login node on the cluster:
>> >
>> > # create profile with default parallel config files, called sge
>> > [login] $> ipython profile create sge --parallel
>> >
>> > edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:
>> >
>> > c.HubFactory.ip = '0.0.0.0'
>> >
>> > to instruct the controller to listen on all interfaces.
>> >
>> > Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:
>> >
>> > c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
>> > c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'
>> >
>> > # optional: specify a queue for all:
>> > c.SGELauncher.queue = 'short'
>> > To instruct ipcluster to use SGE to launch the engines and the
>> controller
>> >
>> > At this point, you can start 10 engines and a controller with:
>> >
>> > [login] $> ipcluster start -n 10 --profile=sge
>> >
>> > Now the only file you will need to connect to the cluster will be in:
>> >
>> > IPYTHON_DIR/profile_sge/security/ipcontroller_client.json
>> >
>> > Just move that file around, and you will be able to connect clients.
>> > To connect from a laptop, you will probably need to specify a login
>> > node as the ssh server when you do:
>> >
>> > from IPython import parallel
>> >
>> > rc = parallel.Client('/path/to/ipcontroller_client.json',
>> > sshserver='you@login.mycluster.etc')
>> >
>> > -MinRK
>> >
>> >
>> > On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina
>> > <Dharhas.Pothina@twdb.state.tx.us> wrote:
>> >> Hi All,
>> >>
>> >> We have managed to parallelize one of our spatial interpolation scripts
>> >> very
>> >> easily with the new ipython parallel. Thanks for developing such a
>> great
>> >> tool, it was fairly easy to get working. Now we are trying to set
>> things
>> >> up
>> >> to run on our internal cluster and I'm having difficulties
>> understanding
>> >> how
>> >> to configure things.
>> >>
>> >> What I would like to do is have ipython running on a local machine
>> >> (windows
>> >> & linux) connect to the cluster, request some nodes through SGE and run
>> >> the
>> >> computation. I'm not quite getting what goes where from the
>> documentation.
>> >>
>> >> I think I understood the PBS example but I'm still not understanding
>> where
>> >> I
>> >> would put the connection information to log into the cluster. I would
>> >> really
>> >> appreciate a step by step of what files need to be where and any
>> example
>> >> config files for an SGE setup.
>> >>
>> >> thanks,
>> >>
>> >> - dharhas
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> IPython-User mailing list
>> >> IPython-User@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/ipython-user
>> >>
>> >>
>> > _______________________________________________
>> > IPython-User mailing list
>> > IPython-User@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/ipython-user
>> >
>> > _______________________________________________
>> > IPython-User mailing list
>> > IPython-User@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/ipython-user
>> >
>> >
>> _______________________________________________
>> IPython-User mailing list
>> IPython-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>
>
>
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20111114/2e0c2abe/attachment-0001.html 


More information about the IPython-User mailing list