[IPython-user] Fwd: Trouble importing my own modules?

Brian Granger ellisonbg.net@gmail....
Thu Jun 14 00:38:52 CDT 2007


---------- Forwarded message ----------
From: Greg Novak <novak@ucolick.org>
Date: Jun 13, 2007 1:37 AM
Subject: Re: [IPython-user] Trouble importing my own modules?
To: Brian Granger <ellisonbg.net@gmail.com>


I noticed that you didn't cc: the list.  If you meant to, feel free to
bounce this message to the list too.

I appreciate your comments on this.  It does seem that we're excited
about two different things.  That's not bad, just something that I
noticed.

Collaborative computing as you describe sounds interesting and
definitely wide open.  For the sake of clarity (not advocacy) I'm
going to try to clearly describe what I envision.  I've tried this
with others already and I've been surprised to meet with some
resistance.  It just seems obvious to me that this is a great idea, so
I can only imagine that before I failed to explain it clearly.

Say I do data analysis on my good old one processor desktop.  I have a
function that takes two minutes to crunch through a little data and
then draws a plot.  Two minutes is a long time to wait--definitely
non-interactive.  I lose my train of thought while I'm waiting.

Now say I've got 100 scientists all doing the same thing.  They each
click, wait two minutes, look at the plot, make a change, click again,
wait another two minutes, etc.  Each bit of computation is done in a
"long, skinny" fashion--that is, each person uses a single CPU and
waits a long time.  A lot of computation happens because there are 100
people doing this on different computers, but each one of them has a
pretty annoying experience, spending a lot of time wating.

Now, suppose that there are great tools that make it drop-dead easy to
farm your computation out onto a 100 processor cluster, via
TaskController, map-reduce, or whatever.  Now the two-minute plot
takes less than a second.  You've entered the regime of interactivity.
 So every person feels like they have a 100 processor cluster sitting
under their desk.

The kicker for me is that you've done nothing but rearrange your
computational resources from a bunch of "long, skinny" pipes to a
single "short, fat" pipe.  You haven't changed the computational
capacity, but you've made a dramatic difference in latency.  If you've
got multiple people using the machine, they basically fill in each
other's idle time, so the machine can still be fully utilized.

There are three obstacles to making this happen in practice.  Number
one is writing the code, and that's in some sense the easy part.
Number two is that you have to convince people to allow a common
resource to be utilized in this way, or partly in this way.  Number
three is that it really does have to be drop-dead easy to make your
code talk to the machine in this way in order for people to actually
do it.

The second one is I think the hard part and I think that the point
about it being possible to fully/nearly fully utilize a machine that's
set up this way is important to convince the people control such
things in my neck of the woods that this is a good idea.

Now, it would be nice if you could just have one engine for each
processor and still manage not to have users step on each other.  But
it seems like that's asking for trouble, and it's probalby more robust
to have multiple engines sitting on each processor, one for each user.
 It's important to me that there be multiple engines per CPU in this
case since I want different people to be able to fill in each other's
idle time.  Then a the controller could be in charge of making sure
that it doesn't dispatch jobs to a CPU that's already running
something for a _different_ controller.  A down side of this is that
you'd pay in memory for having multiple engines on each CPU.  That's
probably not a big deal for small/medium size clusters but doensn't
scale to very large clusters.

Anyway, I'm interested in making this happen.  I don't know when I'll
get to it, but I'll let you know if I have any successes.

Take care,
Greg

On 6/12/07, Brian Granger <ellisonbg.net@gmail.com> wrote:
> > On 6/12/07, Brian Granger <ellisonbg.net@gmail.com> wrote:
> > > Currently, multiple users can connect to a single controller.  As
> > > Fernando mentioned, this is something we have had in mind all along.
> > > The only thing that needs to be worked on is the security model.
> > > Currently there is no authentication scheme used.  But that is on our
> > > list of things to do.
> >
> > There's security and there's also the environment.  That is, some
> > users will be working together on the same project.  They may want to
> > have access to some common data, and also have some private workspace
> > so that they don't step on each other's variables.  Other people may
> > have only private data and want to pretend that there are no other
> > users of the system.
>
> Ahhh.  So the way we have been thinking about this is the following:
>
> * When multiple user connect to the same set of engines, they do have
> their own private workspace, namely, the client ipython/python session
> they are using to talk to the engines.
>
> * Users would only connect to the same set of engines specifically
> because they want to share a parallel workspace.  Parts of parallel
> code that different users need to run in a "private" manner would
> simply be run on different sets of engines.
>
> * An underlying assumption is that a controller + engines is a very
> lightweight on demand entity.  Thus on a 128 node cluster, we don't
> imagine simply having one controller and 128 engines that are always
> on.  We much more imagine that controllers+sets of engine come and go
> as often as user needs demand - and this overall scheduling would be
> handled by a some sort of batch system - like Xgrid, PBS, etc.
>
> * We have thus far avoided having multiple different namespaces within
> a single engine.  But.... we have thought about the fact that some
> users might want this capability.  If we see that this need is really
> there, we would be willing to add this - but because it adds a whole
> new level of complexity to the (already complicated) system, we don't
> want to go there unless we really need to.
>
> If we did go that way, the engines would have methods that look something like:
>
> rc.createNamespace(namespaceKey)
> rc.execute(engineID, code, namespaceKey)
> similar for push, pull, etc.
> rc.setActiveNamespacE(engineID, namespaceKey)
>
> Then you also might want the ability to move/copy objects between namespaces.
>
> > One thing that was a bit of a disappointment to me was that with the
> > RemoteController I have to give something a name to be able to get the
> > value back.  I'd like to do something like:
> >
> > retval = rc.executeAll('somefunc(somedata)')
> > print retval['value']   # This is whatever somefunc returned.
> >
> > But in fact i have to do:
> >
> > rc.executeAll('retval = somefunc(somedata)')
> > print rc['retval']
>
> This has come up before.  There is a specific reason we didn't go that
> route:  Having executeAll return an actual python object (in your
> example, the return value of the function) is a serious performance
> pitfall.  It forces objects to be sent over the wire - even if the
> user doesn't need to use them locally.  By making push/execute/pull
> separate and explicit it forces people to really make sure they want
> to bring an object back before doing it.  With that said Fernando has
> advocated that we add the type of syntax (in addition to having
> execute) to the RemoteController interface, so it might happen in any
> case.  I am probably in favor of keeping the interface as simple as
> possible without being handicapped.
>
> > That's the reason I was poking around in the ipython code--I wanted to
> > figure out how to get 'value' into the dict returned by rc.execute, in
> > addition to stdout and stdin.
> >
> > As far as I was able to see without spending too much time, this is a
> > problem with python, not with ipython, since the bits of python to
> > which the code string is passed handle the string a line at a time,
> > and each line may not be an executable chunk (might be the first line
> > of a for loop) and not every executable chunk produces a value (might
> > be a statement, not an expression).
>
> This is about to change (like within the week).  In the new approach
> entire sections of python code are compiled into the AST tree and run
> as complete code blocks.  In the new system, incomplete lines of code
> will immediately raise a SyntaxError.
>
> > I mention this because once things need to have names to get back to
> > the controller, there's the possibility of users stepping on each
> > others names.
>
> Absolutely - but at some level, this is the price of being able to
> share data.  It is the same as if you and I "share" a dollar - trust
> and communication are required for it to work.
>
> > Then there's the issue of environment as it relates to code, not data.
> >  If I load up a bunch of python modules that I'm working on/debugging,
> > I'm constantly going to be reloading them as I change them.  That's
> > fine as it goes, but I've definitely gotten myself into situations
> > where my code was behaving strangely and the easiest thing was to
> > restart python rather than try to figure out which reload I missed.
> > If there are other users that'd be impossible.  It'd be nice if there
> > was something that would give me a clean slate, or delete all the
> > modules that I've loaded so that I get fresh copies when I import
> > them.  I'm thinking of some kind of escape hatch where the user can
> > say "I can't figure it out, start over"
>
> We do have a reset method that clears the users namespace.  But
> because of how python itself handles imported modules (they are cached
> for each proces) "deleting" modules is not possible.  But, you can
> always try to reload them.
>
> As far as multiple users are concerned, I am not worried about that.
> This goes back to my assumption that the only users will share an
> engine if they absolutely need to.  Thus if we are working together on
> code and you need to restart the engines, I will know immediately
> because we will likely be on the phone/email/irc.
>
> > Also people may want the working directory to be different.
> >
> > Finally, people could do strange things like mess with python's own
> > modules (changing os.path.sep, for instance) which would throw a
> > wrench in other peoples' code.
>
> Again, I think the same applies to these situations.  In my view,
> IPython engines are like beds, you don't typically share one with
> someone else unless both parties _really_ want to.
>
>
> > Anyway, this is what I've been thinking about off-and-on for the past
> > few days.  I offer it as food for thought.
>
> I appreciate the thoughts, they are helpful for us as we think about
> where to go next.  At some level, this stuff is really wide open.
> There hasn't been much research done on truly collaborative computing
> systems - let alone ones that are parallel.  Let us know if you have
> other ideas or futher comments on these ones.
>
> Cheers,
>
> Brian
>
> > Greg
> > _______________________________________________
> > IPython-user mailing list
> > IPython-user@scipy.org
> > http://lists.ipython.scipy.org/mailman/listinfo/ipython-user
> >
>


More information about the IPython-user mailing list