[IPython-dev] SciPy Sprint summary
Tue Jul 20 12:02:58 CDT 2010
On Tue, Jul 20, 2010 at 7:48 AM, Justin Riley <firstname.lastname@example.org> wrote:
> On 07/19/2010 01:06 AM, Brian Granger wrote:
>> * I like the design of the BatchEngineSet. This will be easy to port to
> Excellent :D
>> * I think if we are going to have default submission templates, we need to
>> expose the queue name to the command line. This shouldn't be too tough.
> Added --queue option to my 0.10.1-sge branch and tested this with SGE
> 62u3 and Torque 2.4.6. I don't have LSF to test but I added in the code
> that *should* work with LSF.
>> * Have you tested this with Python 2.6. I saw that you mentioned that
>> the engines were shutting down cleanly now. What did you do to fix that?
>> I am even running into that in 0.11 so any info you can provide would
>> be helpful.
> I've been testing the code with Python 2.6. I didn't do anything special
> other than switch the BatchEngineSet to using job arrays (ie a single
> qsub command instead of N qsubs). Now when I run "ipcluster sge -n 4"
> the controller starts and the engines are launched and at that point the
> ipcluster session is running indefinitely. If I then ctrl-c the
> ipcluster session it catches the signal and calls kill() which
> terminates the engines by canceling the job. Is this the same situation
> you're trying to get working?
Basically yes, but sometimes the signal is not kllling the batch job.
I need to just debug this further.
>> * For now, let's stick with the assumption of a shared $HOME for the furl files.
>> * The biggest thing is if people can test this thoroughly. I don't have
>> SGE/PBS/LSF access right now, so it is a bit difficult for me to help. I
>> have a cluster coming later in the summer, but it is not here yet. Once
>> people have tested it well and are satisfied with it, let's merge it.
>> * If we can update the documentation about how the PBS/SGE support works
>> that would be great. The file is here:
> That sounds fine to me. I'm testing this stuff on my workstation's local
> sge/torque queues and it works fine. I'll also test this with
> StarCluster and make sure it works on a real cluster. If someone else
> can test using LSF on a real cluster (with shared $HOME) that'd be
> great. I'll try to update the docs some time this week.
That would be great. Also when this is working I would like to test it myself.
>> Once these small changes have been made and everyone has tested, me
>> can merge it for the 0.10.1 release.
> Excellent :D
>> Thanks for doing this work Justin and Satra! It is fantastic! Just
>> so you all know where this is going in 0.11:
>> * We are going to get rid of using Twisted in ipcluster. This means we have
>> to re-write the process management stuff to use things like popen.
>> * We have a new configuration system in 0.11. This allows users to maintain
>> cluster profiles that are a set of configuration files for a particular
>> cluster setup. This makes it easy for a user to have multiple clusters
>> configured, which they can then start by name. The logging, security, etc.
>> is also different for each cluster profile.
>> * It will be quite a bit of work to get everything working in 0.11, so I am
>> glad we are getting good PBS/SGE support in 0.10.1.
> I'm willing to help out with the PBS/SGE/LSF portion of ipcluster in
> 0.11, I guess just let me know when is appropriate to start hacking.
That is great, we will keep you posted.
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
More information about the IPython-dev