[IPython-user] Trivial Parallelization (Redux)

Frank Horowitz Frank.Horowitz@csiro...
Fri Dec 12 00:43:21 CST 2008


Hi Fernando (et al.),

Thanks for the examples!

I've managed to get things to the point where I have executed the  
equivalent of the
mec.execute() call on the engines for my case. There have been a  
couple of "gotchas" however.

The first "gotcha" is that the triple quoted (multi-line) strings seem  
to be failing for some reason or other in my environment. (Replicated  
in both
Mac OSX and Linux, BTW.) I needed to execute the initialization code  
inside of a single quoted string, with semicolons separating the  
statements. No big deal.

The second "gotcha" is that I was invoking the two engines (on my  
Core2 Duo boxes) via a command line of "ipcluster -n 2". It turns out  
that this sets the working directory for each engine to whatever  
directory I happened to be in at the time I invoked that command line.  
Once I figured this out ( print mec.execute("import os; print  
os.getcwd()") is your friend! ), setting the path correctly in those  
initializations is straightforward. Probably a PYTHONPATH environment  
variable would help here if it is propagated to the engines, but I  
haven't tested that.

I'm now stuck at the equivalent of your tc.map() call. My code snippet  
looks like:

tc.map(lambda x: fancyobject.chainConvert(x), gcds)

At the time that snippet is executed, the engines hold initialized  
instances of fancyobject, which has a method chainConvert(), and gcds  
is a list of strings.

I get a TypeError with the message:

TypeError" 'str' object does not support item assignment

I'm guessing that having gcds being a sequence of strings, with  
strings being illegal in this api is causing the problem. If this is  
the case, are there any simple workarounds? If not, any suggestions?

Thanks again for your help!
	Frank Horowitz


On 11/12/2008, at 5:35 PM, Fernando Perez wrote:

> Hi Frank,
>
> On Wed, Dec 10, 2008 at 11:45 PM, Frank Horowitz
> <Frank.Horowitz@csiro.au> wrote:
>> Hi All,
>>
>> I'm having a little trouble getting past the "first usage" hurdle of
>> IPython parallelization. (This is likely a FAQ.) It's closely related
>> to Jose Gomez-Dans' thread from earlier this month, but with an added
>> wrinkle.
>>
>> I'm trying to use TaskClient.map() to do its job, but the added
>> wrinkle beyond Jose's case is that my "function" is actually a bound
>> method of an object rather than a pure function.  The TC code  
>> throws a
>> TypeError, complaining that a task function must be a FunctionType.
>> (Quite true, a bound method is not a pure FunctionType...)
>>
>> Is there any easy workaround, perhaps via defining a doIt() type
>> function that was mentioned in that previous thread? If so, I've not
>> been able to discover it...
>>
>> TIA for any help you might be able to provide!
>
> I think we should work  on making this api somewhat cleaner, but
> here's a first stab at a simple solution.  The key is to understand
> how the mechanism works: for your engines to execute a call, they need
> to be able to have in memory an instance of the object they will call,
> whether it's a function or an instance method.  The map methods know
> how to serialize 'on the go' a pure function, but they currently don't
> do the same for class instances.  While we could add that, there are
> issues with unserializing the instance at the other end, state
> handling, etc.
>
> So the following example follows what is a bit more verbose, but
> perhaps safer approach: split the problem in two files, put the
> classes you want in one module that  all engines can import, and then
> create the callable objects directly on the engines.  Once they have
> been created, you can use a simple lambda to wrap the call you need.
> The code is easy:
>
> In [18]: cat simpleclass.py
> class F(object):
>    def doit(self,x):
>        return x**2
>
> In [19]: cat simpletask.py
> from IPython.kernel import client
>
> # Get two handles on the same group of engines
> mec = client.MultiEngineClient()
> tc = client.TaskClient()
>
> # Use the direct execution one to 'prime' the engines with the  
> objects we need
> mec.execute("""
> from simpleclass import F
> fancyobject = F()
> """)
>
> # Now, use the load-balanced tc to scatter calls to .doit() over a  
> range.  Note
> # that this lambda refers to a *remote* name, 'fancyobject' that we  
> created on
> # the engines above.  This lambda is unpacked remotely and excuted  
> on the
> # engines.
> print 'Remote execution:',tc.map(lambda x: fancyobject.doit(x),
>                                 range(10))
>
> In [20]: run simpletask.py
> Remote execution: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>
>
>
> Let us know if this helps, and then we'll add this to the docs if you
> find it useful.
>
> Cheers,
>
> f



More information about the IPython-user mailing list