[IPython-user] Fwd: Trouble importing my own modules?

Brian Granger ellisonbg.net@gmail....
Thu Jun 14 00:35:49 CDT 2007


> On 6/12/07, Brian Granger <ellisonbg.net@gmail.com> wrote:
> > Currently, multiple users can connect to a single controller.  As
> > Fernando mentioned, this is something we have had in mind all along.
> > The only thing that needs to be worked on is the security model.
> > Currently there is no authentication scheme used.  But that is on our
> > list of things to do.
>
> There's security and there's also the environment.  That is, some
> users will be working together on the same project.  They may want to
> have access to some common data, and also have some private workspace
> so that they don't step on each other's variables.  Other people may
> have only private data and want to pretend that there are no other
> users of the system.

Ahhh.  So the way we have been thinking about this is the following:

* When multiple user connect to the same set of engines, they do have
their own private workspace, namely, the client ipython/python session
they are using to talk to the engines.

* Users would only connect to the same set of engines specifically
because they want to share a parallel workspace.  Parts of parallel
code that different users need to run in a "private" manner would
simply be run on different sets of engines.

* An underlying assumption is that a controller + engines is a very
lightweight on demand entity.  Thus on a 128 node cluster, we don't
imagine simply having one controller and 128 engines that are always
on.  We much more imagine that controllers+sets of engine come and go
as often as user needs demand - and this overall scheduling would be
handled by a some sort of batch system - like Xgrid, PBS, etc.

* We have thus far avoided having multiple different namespaces within
a single engine.  But.... we have thought about the fact that some
users might want this capability.  If we see that this need is really
there, we would be willing to add this - but because it adds a whole
new level of complexity to the (already complicated) system, we don't
want to go there unless we really need to.

If we did go that way, the engines would have methods that look something like:

rc.createNamespace(namespaceKey)
rc.execute(engineID, code, namespaceKey)
similar for push, pull, etc.
rc.setActiveNamespacE(engineID, namespaceKey)

Then you also might want the ability to move/copy objects between namespaces.

> One thing that was a bit of a disappointment to me was that with the
> RemoteController I have to give something a name to be able to get the
> value back.  I'd like to do something like:
>
> retval = rc.executeAll('somefunc(somedata)')
> print retval['value']   # This is whatever somefunc returned.
>
> But in fact i have to do:
>
> rc.executeAll('retval = somefunc(somedata)')
> print rc['retval']

This has come up before.  There is a specific reason we didn't go that
route:  Having executeAll return an actual python object (in your
example, the return value of the function) is a serious performance
pitfall.  It forces objects to be sent over the wire - even if the
user doesn't need to use them locally.  By making push/execute/pull
separate and explicit it forces people to really make sure they want
to bring an object back before doing it.  With that said Fernando has
advocated that we add the type of syntax (in addition to having
execute) to the RemoteController interface, so it might happen in any
case.  I am probably in favor of keeping the interface as simple as
possible without being handicapped.

> That's the reason I was poking around in the ipython code--I wanted to
> figure out how to get 'value' into the dict returned by rc.execute, in
> addition to stdout and stdin.
>
> As far as I was able to see without spending too much time, this is a
> problem with python, not with ipython, since the bits of python to
> which the code string is passed handle the string a line at a time,
> and each line may not be an executable chunk (might be the first line
> of a for loop) and not every executable chunk produces a value (might
> be a statement, not an expression).

This is about to change (like within the week).  In the new approach
entire sections of python code are compiled into the AST tree and run
as complete code blocks.  In the new system, incomplete lines of code
will immediately raise a SyntaxError.

> I mention this because once things need to have names to get back to
> the controller, there's the possibility of users stepping on each
> others names.

Absolutely - but at some level, this is the price of being able to
share data.  It is the same as if you and I "share" a dollar - trust
and communication are required for it to work.

> Then there's the issue of environment as it relates to code, not data.
>  If I load up a bunch of python modules that I'm working on/debugging,
> I'm constantly going to be reloading them as I change them.  That's
> fine as it goes, but I've definitely gotten myself into situations
> where my code was behaving strangely and the easiest thing was to
> restart python rather than try to figure out which reload I missed.
> If there are other users that'd be impossible.  It'd be nice if there
> was something that would give me a clean slate, or delete all the
> modules that I've loaded so that I get fresh copies when I import
> them.  I'm thinking of some kind of escape hatch where the user can
> say "I can't figure it out, start over"

We do have a reset method that clears the users namespace.  But
because of how python itself handles imported modules (they are cached
for each proces) "deleting" modules is not possible.  But, you can
always try to reload them.

As far as multiple users are concerned, I am not worried about that.
This goes back to my assumption that the only users will share an
engine if they absolutely need to.  Thus if we are working together on
code and you need to restart the engines, I will know immediately
because we will likely be on the phone/email/irc.

> Also people may want the working directory to be different.
>
> Finally, people could do strange things like mess with python's own
> modules (changing os.path.sep, for instance) which would throw a
> wrench in other peoples' code.

Again, I think the same applies to these situations.  In my view,
IPython engines are like beds, you don't typically share one with
someone else unless both parties _really_ want to.


> Anyway, this is what I've been thinking about off-and-on for the past
> few days.  I offer it as food for thought.

I appreciate the thoughts, they are helpful for us as we think about
where to go next.  At some level, this stuff is really wide open.
There hasn't been much research done on truly collaborative computing
systems - let alone ones that are parallel.  Let us know if you have
other ideas or futher comments on these ones.

Cheers,

Brian

> Greg
> _______________________________________________
> IPython-user mailing list
> IPython-user@scipy.org
> http://lists.ipython.scipy.org/mailman/listinfo/ipython-user
>


More information about the IPython-user mailing list