[IPython-dev] SciPy Sprint summary

Satrajit Ghosh satra@mit....
Wed Jul 21 20:05:25 CDT 2010


hi justin.

i really don't know what the difference is, but i clean installed everything
and it works beautifully on SGE.

cheers,

satra


On Tue, Jul 20, 2010 at 4:04 PM, Brian Granger <ellisonbg@gmail.com> wrote:

> Great!  I mean great that you and Justin are testing and debugging this.
>
> Brian
>
> On Tue, Jul 20, 2010 at 1:01 PM, Satrajit Ghosh <satra@mit.edu> wrote:
> > hi brian,
> >
> > i ran into a problem (my engines were not starting) and justin and i are
> > going to try and figure out what's causing it.
> >
> > cheers,
> >
> > satra
> >
> >
> > On Tue, Jul 20, 2010 at 3:19 PM, Brian Granger <ellisonbg@gmail.com>
> wrote:
> >>
> >> Satra,
> >>
> >> If you could test this as well, that would be great.  Thanks.  Justin,
> >> let us know when you think it is ready to go with the documentation
> >> and testing.
> >>
> >> Cheers,
> >>
> >> Brian
> >>
> >> On Tue, Jul 20, 2010 at 7:48 AM, Justin Riley <justin.t.riley@gmail.com
> >
> >> wrote:
> >> > On 07/19/2010 01:06 AM, Brian Granger wrote:
> >> >> * I like the design of the BatchEngineSet.  This will be easy to port
> >> >> to
> >> >>   0.11.
> >> > Excellent :D
> >> >
> >> >> * I think if we are going to have default submission templates, we
> need
> >> >> to
> >> >>   expose the queue name to the command line.  This shouldn't be too
> >> >> tough.
> >> >
> >> > Added --queue option to my 0.10.1-sge branch and tested this with SGE
> >> > 62u3 and Torque 2.4.6. I don't have LSF to test but I added in the
> code
> >> > that *should* work with LSF.
> >> >
> >> >> * Have you tested this with Python 2.6.  I saw that you mentioned
> that
> >> >>   the engines were shutting down cleanly now.  What did you do to fix
> >> >> that?
> >> >>   I am even running into that in 0.11 so any info you can provide
> would
> >> >>   be helpful.
> >> >
> >> > I've been testing the code with Python 2.6. I didn't do anything
> special
> >> > other than switch the BatchEngineSet to using job arrays (ie a single
> >> > qsub command instead of N qsubs). Now when I run "ipcluster sge -n 4"
> >> > the controller starts and the engines are launched and at that point
> the
> >> > ipcluster session is running indefinitely. If I then ctrl-c the
> >> > ipcluster session it catches the signal and calls kill() which
> >> > terminates the engines by canceling the job. Is this the same
> situation
> >> > you're trying to get working?
> >> >
> >> >> * For now, let's stick with the assumption of a shared $HOME for the
> >> >> furl files.
> >> >> * The biggest thing is if people can test this thoroughly.  I don't
> >> >> have
> >> >>   SGE/PBS/LSF access right now, so it is a bit difficult for me to
> >> >> help. I
> >> >>   have a cluster coming later in the summer, but it is not here yet.
> >> >>  Once
> >> >>   people have tested it well and are satisfied with it, let's merge
> it.
> >> >> * If we can update the documentation about how the PBS/SGE support
> >> >> works
> >> >>   that would be great.  The file is here:
> >> >
> >> > That sounds fine to me. I'm testing this stuff on my workstation's
> local
> >> > sge/torque queues and it works fine. I'll also test this with
> >> > StarCluster and make sure it works on a real cluster. If someone else
> >> > can test using LSF on a real cluster (with shared $HOME) that'd be
> >> > great. I'll try to update the docs some time this week.
> >> >
> >> >>
> >> >> Once these small changes have been made and everyone has tested, me
> >> >> can merge it for the 0.10.1 release.
> >> > Excellent :D
> >> >
> >> >> Thanks for doing this work Justin and Satra!  It is fantastic!  Just
> >> >> so you all know where this is going in 0.11:
> >> >>
> >> >> * We are going to get rid of using Twisted in ipcluster.  This means
> we
> >> >> have
> >> >>   to re-write the process management stuff to use things like popen.
> >> >> * We have a new configuration system in 0.11.  This allows users to
> >> >> maintain
> >> >>   cluster profiles that are a set of configuration files for a
> >> >> particular
> >> >>   cluster setup.  This makes it easy for a user to have multiple
> >> >> clusters
> >> >>   configured, which they can then start by name.  The logging,
> >> >> security, etc.
> >> >>   is also different for each cluster profile.
> >> >> * It will be quite a bit of work to get everything working in 0.11,
> so
> >> >> I am
> >> >>   glad we are getting good PBS/SGE support in 0.10.1.
> >> >
> >> > I'm willing to help out with the PBS/SGE/LSF portion of ipcluster in
> >> > 0.11, I guess just let me know when is appropriate to start hacking.
> >> >
> >> > Thanks!
> >> >
> >> > ~Justin
> >> >
> >>
> >>
> >>
> >> --
> >> Brian E. Granger, Ph.D.
> >> Assistant Professor of Physics
> >> Cal Poly State University, San Luis Obispo
> >> bgranger@calpoly.edu
> >> ellisonbg@gmail.com
> >
> >
>
>
>
> --
> Brian E. Granger, Ph.D.
> Assistant Professor of Physics
> Cal Poly State University, San Luis Obispo
> bgranger@calpoly.edu
> ellisonbg@gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-dev/attachments/20100721/40d7000e/attachment-0001.html 


More information about the IPython-dev mailing list