[IPython-dev] SciPy Sprint summary
Tue Jul 20 09:48:15 CDT 2010
On 07/19/2010 01:06 AM, Brian Granger wrote:
> * I like the design of the BatchEngineSet. This will be easy to port to
> * I think if we are going to have default submission templates, we need to
> expose the queue name to the command line. This shouldn't be too tough.
Added --queue option to my 0.10.1-sge branch and tested this with SGE
62u3 and Torque 2.4.6. I don't have LSF to test but I added in the code
that *should* work with LSF.
> * Have you tested this with Python 2.6. I saw that you mentioned that
> the engines were shutting down cleanly now. What did you do to fix that?
> I am even running into that in 0.11 so any info you can provide would
> be helpful.
I've been testing the code with Python 2.6. I didn't do anything special
other than switch the BatchEngineSet to using job arrays (ie a single
qsub command instead of N qsubs). Now when I run "ipcluster sge -n 4"
the controller starts and the engines are launched and at that point the
ipcluster session is running indefinitely. If I then ctrl-c the
ipcluster session it catches the signal and calls kill() which
terminates the engines by canceling the job. Is this the same situation
you're trying to get working?
> * For now, let's stick with the assumption of a shared $HOME for the furl files.
> * The biggest thing is if people can test this thoroughly. I don't have
> SGE/PBS/LSF access right now, so it is a bit difficult for me to help. I
> have a cluster coming later in the summer, but it is not here yet. Once
> people have tested it well and are satisfied with it, let's merge it.
> * If we can update the documentation about how the PBS/SGE support works
> that would be great. The file is here:
That sounds fine to me. I'm testing this stuff on my workstation's local
sge/torque queues and it works fine. I'll also test this with
StarCluster and make sure it works on a real cluster. If someone else
can test using LSF on a real cluster (with shared $HOME) that'd be
great. I'll try to update the docs some time this week.
> Once these small changes have been made and everyone has tested, me
> can merge it for the 0.10.1 release.
> Thanks for doing this work Justin and Satra! It is fantastic! Just
> so you all know where this is going in 0.11:
> * We are going to get rid of using Twisted in ipcluster. This means we have
> to re-write the process management stuff to use things like popen.
> * We have a new configuration system in 0.11. This allows users to maintain
> cluster profiles that are a set of configuration files for a particular
> cluster setup. This makes it easy for a user to have multiple clusters
> configured, which they can then start by name. The logging, security, etc.
> is also different for each cluster profile.
> * It will be quite a bit of work to get everything working in 0.11, so I am
> glad we are getting good PBS/SGE support in 0.10.1.
I'm willing to help out with the PBS/SGE/LSF portion of ipcluster in
0.11, I guess just let me know when is appropriate to start hacking.
More information about the IPython-dev