[IPython-User] Running multiple ipclusters on remote cluster w/ Sun Grid Engine.
Wed Sep 14 16:32:31 CDT 2011
It may be more a case of differing nomenclature. To me a profile/profile name is something you set up once and applies to a class of things. i.e within SGE we have a parallel environment (or profile) called mpich and when we tell any script to use that particular parallel environment it sets things up a certain way. When you actually submit a job to SGE using that profile it gets a jobid which is what you can use to track or kill the actual job.
The 1-1 correspondence makes sense if you plan to have the ipcluster running continuously of a certain number of cluster nodes and keep connecting and disconnecting with local ipython clients.
To me the use case that makes sense is different. We submit a job to run on a certain number of nodes and after the job s completed those the nodes are released for other non ipython runs like our fortran hydro models. In that case the 'profile' is what tells it how to submit a job to the sge queue etc and the job-id or controller-id is what we use to run the job/kill the job etc. Maybe --controller-id flag could be an optional parameter.
Another feature request is some way of knowing when the engines have all started up, depending on how busy the cluster SGE queue is the engine may not start up immediately. Right now, I'm using a while loop that checks for the presence of the json file every 5 seconds. This works but seem inelegant.
let me know if this use case makes sense or if I'm missing something in the way these features were designed to be used.
>>> MinRK <email@example.com> 9/14/2011 2:12 PM >>>
On Wed, Sep 14, 2011 at 11:13, Fernando Perez <firstname.lastname@example.org> wrote:
On Wed, Sep 14, 2011 at 6:59 AM, Dharhas Pothina
> I ended up writing a script that connected to the cluster and made a copy of
> an already created profile with a new unique name, started ipcluster, waited
> till the json file was created and then retrieved the json file for use in a
> local client, runs my script and then cleans up afterwards.
> This seems to be working fairly well except when the local script exits
> because of an error. In that case, I need to log in and stop the engines,
> clean up files etc manually.
OK. We probably should remove the assumption of a 1 to 1 mapping
between profiles and running clusters, but that will require a fair
bit of reorganization of code that uses that assumption, so I'm glad
you found a solution for now.
Yes, it's a pretty big deal that the only thing engines and clients need to know to connect to a cluster is the profile name. That is lost entirely if we allow multiple clusters with a single profile, since profile name becomes ambiguous. We would then need to add a second layer of specification for which controller to use within a given profile, e.g.:
ipengine --profile=mysge --controller-id=12345
I think I could add support for exactly this without much code change at all, though.
Feature Request opened on GitHub: https://github.com/ipython/ipython/issues/794
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User