<br><br><div class="gmail_quote">On Mon, Nov 14, 2011 at 19:10, Ariel Rokem <span dir="ltr"><<a href="mailto:arokem@gmail.com">arokem@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi everyone, <br><br>Following up on this thread, I am trying to get this working on the SGE on our local cluster (thankfully, everyone is away at a conference, so I have the cluster pretty much to myself. Good week for experimenting...). <br>
<br>I updated my fork from ipython/master this afternoon and followed the instructions below. I am getting the following behavior: <br><br>celadon:~ $ipcluster start --n=10 --profile=sge<br>[IPClusterStart] Using existing profile dir: u'/home/arokem/.config/ipython/profile_sge'<br>
[IPClusterStart] Starting ipcluster with [daemon=False]<br>[IPClusterStart] Creating pid file: /home/arokem/.config/ipython/profile_sge/pid/ipcluster.pid<br>[IPClusterStart] Starting PBSControllerLauncher: ['qsub', u'./sge_controller']<br>
[IPClusterStart] adding job array settings to batch script<br>ERROR:root:Error in periodic callback<br>Traceback (most recent call last):<br> File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 423, in _run<br>
self.callback()<br> File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/ipclusterapp.py", line 497, in start_controller<br> self.controller_launcher.start()<br> File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 1022, in start<br>
return super(SGEControllerLauncher, self).start(1)<br> File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 936, in start<br> self.write_batch_script(n)<br> File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 925, in write_batch_script<br>
script_as_string = self.formatter.format(self.batch_template, **self.context)<br> File "/usr/lib64/python2.7/string.py", line 545, in format<br> return self.vformat(format_string, args, kwargs)<br> File "/usr/lib64/python2.7/string.py", line 549, in vformat<br>
result = self._vformat(format_string, args, kwargs, used_args, 2)<br> File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/utils/text.py", line 652, in _vformat<br> obj = eval(field_name, kwargs)<br>
File "<string>", line 1, in <module><br>NameError: name 'n' is not defined<br>[IPClusterStart] Starting 10 engines<br>[IPClusterStart] Starting 10 engines with SGEEngineSetLauncher: ['qsub', u'./sge_engines']<br>
[IPClusterStart] adding job array settings to batch script<br>[IPClusterStart] Writing instantiated batch script: ./sge_engines<br>[IPClusterStart] Job submitted with job id: '430658'<br>[IPClusterStart] Process 'qsub' started: '430658'<br>
[IPClusterStart] Engines appear to have started successfully<br><br>It looks like something goes wrong (the NameError), but then the jobs get submitted and for a brief time, qmon does acknowledge the existence of a list of jobs with that id, but then it disappears (somehow gets deleted?) from qmon almost immediately and when I try to initialize a parallel.Client with the "sge" profile in an ipython session, I get a "TimeoutError: Hub connection request timed out". I also tried initializing ipcluster with the default profile and run some computations and I am getting the approximately 7-fold expected speed-up (on an 8 core machine), so some things do work. Does anyone have any idea what is going wrong with the SGE? <br>
</blockquote><div><br></div><div>This is a horrible typo that crept in when I did some reorganization in the launchers. Should be fixed in master.</div><div><br></div><div>The TimeoutError in the client generally means that the controller isn't running, or at least isn't where connection files claimed it to be.</div>
<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>Thanks, <br><font color="#888888"><br>Ariel <br></font><div><div></div><div class="h5"><br><br><br><br><div class="gmail_quote">On Wed, Aug 24, 2011 at 3:07 PM, MinRK <span dir="ltr"><<a href="mailto:benjaminrk@gmail.com" target="_blank">benjaminrk@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
<div>On Wed, Aug 24, 2011 at 15:05, Dharhas Pothina<br>
<<a href="mailto:Dharhas.Pothina@twdb.state.tx.us" target="_blank">Dharhas.Pothina@twdb.state.tx.us</a>> wrote:<br>
><br>
> I was able to start the engines and they were submitted to the queue<br>
> properly but I do not have a json file in the corresponding security folder.<br>
> Do I need to do something to generate it.<br>
<br>
</div>The JSON file is written by ipcontroller, so it will only show up<br>
after the controller has started.<br>
<div><div></div><div><br>
><br>
> - dharhas<br>
><br>
>>>> MinRK <<a href="mailto:benjaminrk@gmail.com" target="_blank">benjaminrk@gmail.com</a>> 8/24/2011 4:44 PM >>><br>
> On a login node on the cluster:<br>
><br>
> # create profile with default parallel config files, called sge<br>
> [login] $> ipython profile create sge --parallel<br>
><br>
> edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:<br>
><br>
> c.HubFactory.ip = '0.0.0.0'<br>
><br>
> to instruct the controller to listen on all interfaces.<br>
><br>
> Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:<br>
><br>
> c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'<br>
> c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'<br>
><br>
> # optional: specify a queue for all:<br>
> c.SGELauncher.queue = 'short'<br>
> To instruct ipcluster to use SGE to launch the engines and the controller<br>
><br>
> At this point, you can start 10 engines and a controller with:<br>
><br>
> [login] $> ipcluster start -n 10 --profile=sge<br>
><br>
> Now the only file you will need to connect to the cluster will be in:<br>
><br>
> IPYTHON_DIR/profile_sge/security/ipcontroller_client.json<br>
><br>
> Just move that file around, and you will be able to connect clients.<br>
> To connect from a laptop, you will probably need to specify a login<br>
> node as the ssh server when you do:<br>
><br>
> from IPython import parallel<br>
><br>
> rc = parallel.Client('/path/to/ipcontroller_client.json',<br>
> sshserver='you@login.mycluster.etc')<br>
><br>
> -MinRK<br>
><br>
><br>
> On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina<br>
> <<a href="mailto:Dharhas.Pothina@twdb.state.tx.us" target="_blank">Dharhas.Pothina@twdb.state.tx.us</a>> wrote:<br>
>> Hi All,<br>
>><br>
>> We have managed to parallelize one of our spatial interpolation scripts<br>
>> very<br>
>> easily with the new ipython parallel. Thanks for developing such a great<br>
>> tool, it was fairly easy to get working. Now we are trying to set things<br>
>> up<br>
>> to run on our internal cluster and I'm having difficulties understanding<br>
>> how<br>
>> to configure things.<br>
>><br>
>> What I would like to do is have ipython running on a local machine<br>
>> (windows<br>
>> & linux) connect to the cluster, request some nodes through SGE and run<br>
>> the<br>
>> computation. I'm not quite getting what goes where from the documentation.<br>
>><br>
>> I think I understood the PBS example but I'm still not understanding where<br>
>> I<br>
>> would put the connection information to log into the cluster. I would<br>
>> really<br>
>> appreciate a step by step of what files need to be where and any example<br>
>> config files for an SGE setup.<br>
>><br>
>> thanks,<br>
>><br>
>> - dharhas<br>
>><br>
>><br>
>><br>
>><br>
>><br>
>> _______________________________________________<br>
>> IPython-User mailing list<br>
>> <a href="mailto:IPython-User@scipy.org" target="_blank">IPython-User@scipy.org</a><br>
>> <a href="http://mail.scipy.org/mailman/listinfo/ipython-user" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-user</a><br>
>><br>
>><br>
> _______________________________________________<br>
> IPython-User mailing list<br>
> <a href="mailto:IPython-User@scipy.org" target="_blank">IPython-User@scipy.org</a><br>
> <a href="http://mail.scipy.org/mailman/listinfo/ipython-user" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-user</a><br>
><br>
> _______________________________________________<br>
> IPython-User mailing list<br>
> <a href="mailto:IPython-User@scipy.org" target="_blank">IPython-User@scipy.org</a><br>
> <a href="http://mail.scipy.org/mailman/listinfo/ipython-user" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-user</a><br>
><br>
><br>
_______________________________________________<br>
IPython-User mailing list<br>
<a href="mailto:IPython-User@scipy.org" target="_blank">IPython-User@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/ipython-user" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-user</a><br>
</div></div></blockquote></div><br>
</div></div><br>_______________________________________________<br>
IPython-User mailing list<br>
<a href="mailto:IPython-User@scipy.org">IPython-User@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/ipython-user" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-user</a><br>
<br></blockquote></div><br>