[IPython-dev] Musings: syntax for high-level expression of parallel (and other) execution control

Fernando Perez fperez.net@gmail....
Fri Sep 4 03:31:08 CDT 2009

Hi all,

I know I should have been hard at work on continuing the branch review
work I have in an open tab of Brian's push, but I couldn't resist.
Please bear with me, this is  a bit technical but, I hope, very
interesting in the long run for us...

This part of Ars Technica's excellent review of Snow Leopard:


shows how Apple tackled the problem of providing civilized primitives
to express parallellism in applications and a mechanism to make useful
decisions based on this information.   The idea is to combine a
kernel-level dispatcher (GCD), a beautiful extension to the C language
(yes, they extended C!) in the form of anonymous code blocks, and an
API to inform GCD of your parallelism breakup easily, so GCD can use
your intentions at runtime efficiently.  It's really a remarkably
simple, yet effective (IMHO) combination of tools to tackle a thorny

In any case, what does all this have to do with us?  For a long time
we've wondered about how to provide the easiest, simplest APIs that
appear natural to the user, that are easy to convert into serial
execution mode trivially (with a simple global switch for debugging,
NOT changing any actual code everywhere), and that can permit
execution via ipython.  A while ago I hacked something via 'with' and
context  managers, that was so horrible and brittle (it involved stack
manipulations, manual source introspection and exception injection)
that I realized that could never really fly for production work.

But this article on GCD got me trying my 'with' approach again, and I
realized that syntactically it felt quite nice, I could write python
versions of the code examples in that review, yet the whole 'with'
mess killed it for me.  And then it hit me that decorators could be
abused just a little bit to get the same job done [1]!  While this may
be somewhat of an abuse, it does NOT involve source introspection or
stack manipulations, so in principle it's 100% kosher, robust python.
A little weird the first time you see it, but bear with me.

The code below shows an implementation of a simple for  loop directly
and via a decorator.  Both versions do the same thing, but the point
is that by providing such decorators, we can *trivially* provide a
GCD-style API for users to express their parallelism and have
execution chunks handled by ipython remotely.

It's obvious that such decorators can also be used to dispatch code to
Cython, to a GPU,  to a CorePy-based optimizer, to a profiler, etc.  I
think this could be a useful idea in more than one context, and it
certainly feels to me like one of the missing API/usability pieces
we've struggled with for the ipython distributed machinery.



[1] What clicked in my head was tying the 'with' mess to how the Sage
notebook uses the @interact decorator to immediately call the
decorated function rather than decorating it and returning it.  This
immediate-consumption (ab)use of a decorator is what I'm using.

### CODE example

# Consider a simple pair of 'loop body' and 'loop summary' functions:
def do_work(data, i):
    return data[i]/2

def summarize(results, count):
    return sum(results[:count])

# And some 'dataset' (here just a list of 10 numbers
count = 10
data = [3.0*j for j in range(count) ]

# That we'll process.  This is our processing loop, implemented as a regular
# serial function that preallocates storage and then goes to work.
def loop_serial():
    results = [None]*count

    for i in range(count):
        results[i] = do_work(data, i)

    return summarize(results, count)

# The same thing can be done with a decorator:
def for_each(iterable):
    """This decorator-based loop does a normal serial run.
    But in principle it could be doing the dispatch remotely, or into a thread
    pool, etc.
    def call(func):
        map(func, iterable)

    return call

# This is the actual code of the decorator-based loop:
def loop_deco():
    results = [None]*count

    def loop(i):
        results[i] = do_work(data, i)

    return summarize(results, count)

# Test
assert loop_serial() == loop_deco()
print 'OK'

More information about the IPython-dev mailing list