[IPython-dev] SciPy Sprint summary

Brian Granger ellisonbg@gmail....
Tue Jul 20 12:02:58 CDT 2010


On Tue, Jul 20, 2010 at 7:48 AM, Justin Riley <justin.t.riley@gmail.com> wrote:
> On 07/19/2010 01:06 AM, Brian Granger wrote:
>> * I like the design of the BatchEngineSet.  This will be easy to port to
>>   0.11.
> Excellent :D
>
>> * I think if we are going to have default submission templates, we need to
>>   expose the queue name to the command line.  This shouldn't be too tough.
>
> Added --queue option to my 0.10.1-sge branch and tested this with SGE
> 62u3 and Torque 2.4.6. I don't have LSF to test but I added in the code
> that *should* work with LSF.

Awesome!

>> * Have you tested this with Python 2.6.  I saw that you mentioned that
>>   the engines were shutting down cleanly now.  What did you do to fix that?
>>   I am even running into that in 0.11 so any info you can provide would
>>   be helpful.
>
> I've been testing the code with Python 2.6. I didn't do anything special
> other than switch the BatchEngineSet to using job arrays (ie a single
> qsub command instead of N qsubs). Now when I run "ipcluster sge -n 4"
> the controller starts and the engines are launched and at that point the
> ipcluster session is running indefinitely. If I then ctrl-c the
> ipcluster session it catches the signal and calls kill() which
> terminates the engines by canceling the job. Is this the same situation
> you're trying to get working?

Basically yes, but sometimes the signal is not kllling the batch job.
I need to just debug this further.

>> * For now, let's stick with the assumption of a shared $HOME for the furl files.
>> * The biggest thing is if people can test this thoroughly.  I don't have
>>   SGE/PBS/LSF access right now, so it is a bit difficult for me to help. I
>>   have a cluster coming later in the summer, but it is not here yet.  Once
>>   people have tested it well and are satisfied with it, let's merge it.
>> * If we can update the documentation about how the PBS/SGE support works
>>   that would be great.  The file is here:
>
> That sounds fine to me. I'm testing this stuff on my workstation's local
> sge/torque queues and it works fine. I'll also test this with
> StarCluster and make sure it works on a real cluster. If someone else
> can test using LSF on a real cluster (with shared $HOME) that'd be
> great. I'll try to update the docs some time this week.

That would be great.  Also when this is working I would like to test it myself.

>>
>> Once these small changes have been made and everyone has tested, me
>> can merge it for the 0.10.1 release.
> Excellent :D
>
>> Thanks for doing this work Justin and Satra!  It is fantastic!  Just
>> so you all know where this is going in 0.11:
>>
>> * We are going to get rid of using Twisted in ipcluster.  This means we have
>>   to re-write the process management stuff to use things like popen.
>> * We have a new configuration system in 0.11.  This allows users to maintain
>>   cluster profiles that are a set of configuration files for a particular
>>   cluster setup.  This makes it easy for a user to have multiple clusters
>>   configured, which they can then start by name.  The logging, security, etc.
>>   is also different for each cluster profile.
>> * It will be quite a bit of work to get everything working in 0.11, so I am
>>   glad we are getting good PBS/SGE support in 0.10.1.
>
> I'm willing to help out with the PBS/SGE/LSF portion of ipcluster in
> 0.11, I guess just let me know when is appropriate to start hacking.

That is great, we will keep you posted.

Cheers,

Brian

> Thanks!
>
> ~Justin
>



-- 
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu
ellisonbg@gmail.com


More information about the IPython-dev mailing list