[IPython-user] Trivial Parallelization (Redux)

Frank Horowitz frank.horowitz@csiro...
Fri Dec 12 01:44:14 CST 2008


Ooops!

Err, never mind on the TypeError. That was a problem in my serial  
code. :-(

I've fixed the code, and am now executing in parallel. (Lesson one.  
Make sure your code runs serially first!)

Thanks again for the help. The gotchas and my workarounds are still  
extent....

Cheers,
	Frank Horowitz



On 12/12/2008, at 3:43 PM, Frank Horowitz wrote:

> Hi Fernando (et al.),
>
> Thanks for the examples!
>
> I've managed to get things to the point where I have executed the
> equivalent of the
> mec.execute() call on the engines for my case. There have been a
> couple of "gotchas" however.
>
> The first "gotcha" is that the triple quoted (multi-line) strings seem
> to be failing for some reason or other in my environment. (Replicated
> in both
> Mac OSX and Linux, BTW.) I needed to execute the initialization code
> inside of a single quoted string, with semicolons separating the
> statements. No big deal.
>
> The second "gotcha" is that I was invoking the two engines (on my
> Core2 Duo boxes) via a command line of "ipcluster -n 2". It turns out
> that this sets the working directory for each engine to whatever
> directory I happened to be in at the time I invoked that command line.
> Once I figured this out ( print mec.execute("import os; print
> os.getcwd()") is your friend! ), setting the path correctly in those
> initializations is straightforward. Probably a PYTHONPATH environment
> variable would help here if it is propagated to the engines, but I
> haven't tested that.
>
> I'm now stuck at the equivalent of your tc.map() call. My code snippet
> looks like:
>
> tc.map(lambda x: fancyobject.chainConvert(x), gcds)
>
> At the time that snippet is executed, the engines hold initialized
> instances of fancyobject, which has a method chainConvert(), and gcds
> is a list of strings.
>
> I get a TypeError with the message:
>
> TypeError" 'str' object does not support item assignment
>
> I'm guessing that having gcds being a sequence of strings, with
> strings being illegal in this api is causing the problem. If this is
> the case, are there any simple workarounds? If not, any suggestions?
>
> Thanks again for your help!
> 	Frank Horowitz
>
>
> On 11/12/2008, at 5:35 PM, Fernando Perez wrote:
>
>> Hi Frank,
>>
>> On Wed, Dec 10, 2008 at 11:45 PM, Frank Horowitz
>> <Frank.Horowitz@csiro.au> wrote:
>>> Hi All,
>>>
>>> I'm having a little trouble getting past the "first usage" hurdle of
>>> IPython parallelization. (This is likely a FAQ.) It's closely  
>>> related
>>> to Jose Gomez-Dans' thread from earlier this month, but with an  
>>> added
>>> wrinkle.
>>>
>>> I'm trying to use TaskClient.map() to do its job, but the added
>>> wrinkle beyond Jose's case is that my "function" is actually a bound
>>> method of an object rather than a pure function.  The TC code
>>> throws a
>>> TypeError, complaining that a task function must be a FunctionType.
>>> (Quite true, a bound method is not a pure FunctionType...)
>>>
>>> Is there any easy workaround, perhaps via defining a doIt() type
>>> function that was mentioned in that previous thread? If so, I've not
>>> been able to discover it...
>>>
>>> TIA for any help you might be able to provide!
>>
>> I think we should work  on making this api somewhat cleaner, but
>> here's a first stab at a simple solution.  The key is to understand
>> how the mechanism works: for your engines to execute a call, they  
>> need
>> to be able to have in memory an instance of the object they will  
>> call,
>> whether it's a function or an instance method.  The map methods know
>> how to serialize 'on the go' a pure function, but they currently  
>> don't
>> do the same for class instances.  While we could add that, there are
>> issues with unserializing the instance at the other end, state
>> handling, etc.
>>
>> So the following example follows what is a bit more verbose, but
>> perhaps safer approach: split the problem in two files, put the
>> classes you want in one module that  all engines can import, and then
>> create the callable objects directly on the engines.  Once they have
>> been created, you can use a simple lambda to wrap the call you need.
>> The code is easy:
>>
>> In [18]: cat simpleclass.py
>> class F(object):
>>   def doit(self,x):
>>       return x**2
>>
>> In [19]: cat simpletask.py
>> from IPython.kernel import client
>>
>> # Get two handles on the same group of engines
>> mec = client.MultiEngineClient()
>> tc = client.TaskClient()
>>
>> # Use the direct execution one to 'prime' the engines with the
>> objects we need
>> mec.execute("""
>> from simpleclass import F
>> fancyobject = F()
>> """)
>>
>> # Now, use the load-balanced tc to scatter calls to .doit() over a
>> range.  Note
>> # that this lambda refers to a *remote* name, 'fancyobject' that we
>> created on
>> # the engines above.  This lambda is unpacked remotely and excuted
>> on the
>> # engines.
>> print 'Remote execution:',tc.map(lambda x: fancyobject.doit(x),
>>                                range(10))
>>
>> In [20]: run simpletask.py
>> Remote execution: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>
>>
>>
>> Let us know if this helps, and then we'll add this to the docs if you
>> find it useful.
>>
>> Cheers,
>>
>> f



More information about the IPython-user mailing list