[IPython-User] Multiprocessing Pool woes

Jeffrey Bush jeff@coderforlife....
Tue Feb 11 15:00:39 CST 2014

There are some serious limitations to what functions can be used by
multiprocessing.Pool (in any interactive prompt, not just IPython). The
main one which you are running into right now is that the function must be
accessible through an import of your module. It can't be defined
dynamically (aka at the interactive prompt or through closures or most

With all that said, there is a solution that works for my minimal testing.
Currently my solution does not support the use of global variables, but
that could be added with additional work (the code below has some of the
necessary code for it, but it is more complicated than it seems so I
skipped it for now). This means that if you call any other functions from
the primary function, you will have to either define them in the dynamic
function, import them in the dynamic function yourself, or fix the globals

Save the following as poolable.py (I wrote this code myself and release it
into the public domain). It needs to be either in the current directory or
on the Python path (e.g. in site-packages).


from types import FunctionType
import marshal

def _applicable(*args, **kwargs):
    name = kwargs['__pw_name']
    code = marshal.loads(kwargs['__pw_code'])
    gbls = globals() #gbls = marshal.loads(kwargs['__pw_gbls'])
    defs = marshal.loads(kwargs['__pw_defs'])
    clsr = marshal.loads(kwargs['__pw_clsr'])
    fdct = marshal.loads(kwargs['__pw_fdct'])
    func = FunctionType(code, gbls, name, defs, clsr)
    func.fdct = fdct
    del kwargs['__pw_name']
    del kwargs['__pw_code']
    #del kwargs['__pw_gbls']
    del kwargs['__pw_defs']
    del kwargs['__pw_clsr']
    del kwargs['__pw_fdct']
    return func(*args, **kwargs)

def make_applicable(f, *args, **kwargs):
    if not isinstance(f, FunctionType): raise ValueError('argument must be
a function')
    kwargs['__pw_name'] = f.func_name
    kwargs['__pw_code'] = marshal.dumps(f.func_code)
    #kwargs['__pw_gbls'] = marshal.dumps(f.func_globals)
    kwargs['__pw_defs'] = marshal.dumps(f.func_defaults)
    kwargs['__pw_clsr'] = marshal.dumps(f.func_closure)
    kwargs['__pw_fdct'] = marshal.dumps(f.func_dict)
    return _applicable, args, kwargs

def _mappable(x):
    x,name,code,defs,clsr,fdct = x
    code = marshal.loads(code)
    gbls = globals() #gbls = marshal.loads(gbls)
    defs = marshal.loads(defs)
    clsr = marshal.loads(clsr)
    fdct = marshal.loads(fdct)
    func = FunctionType(code, gbls, name, defs, clsr)
    func.fdct = fdct
    return func(x)

def make_mappable(f, iterable):
    if not isinstance(f, FunctionType): raise ValueError('argument must be
a function')
    name = f.func_name
    code = marshal.dumps(f.func_code)
    #gbls = marshal.dumps(f.func_globals)
    defs = marshal.dumps(f.func_defaults)
    clsr = marshal.dumps(f.func_closure)
    fdct = marshal.dumps(f.func_dict)
    return _mappable, ((i,name,code,defs,clsr,fdct) for i in iterable)


Now your example can be re-written to work:

from multiprocessing import Pool
from poolable import make_applicable, make_mappable

def f(x): return x*x

pool = Pool(processes=4)
result = pool.apply_async(*make_applicable(f, 10))
print result.get(timeout=1)
print pool.map(*make_mappable(f, range(10)))

Important differences are:

   1. You do NOT need the if statement checking for __main__ module
   2. You give Pool.apply and Pool.apply_async the expanded (using *)
   result of make_applicable which takes your function, arguments, and keyword
   arguments. You do not place arguments and keyword arguments in [] and {}
   3. You give Pool.map, Pool.map_async, Pool.imap, and Pool.imap_unordered
   the expanded (using *) result of make_mappable which takes your function
   and the iterable.

How does this work? It serializes all the information about the function
(including the code but not the globals at the moment), augments the
arguments/iterable to contain this information, and returns a function that
apply/map can use and utilize the extra information to unserialize the

If you have any more questions, I am glad to answer them. So far I tested
this with IPython console on Windows 7 x64 running Python v2.7.5 32-bit
with IPython v1.1. If I have more time I will try to make something that
can retain the globals as well. I am sure there are plenty of other
caveats, and as MinRK has pointed out, all of these problems (and caveats
that this probably has) are due to the different way that Windows and *nix
handle process creation.


On Tue, Feb 11, 2014 at 7:32 AM, pbr <tim.pierson@gmail.com> wrote:

> I've seen this problem reported on stackoverflow, but I can't seem to find
> a
> solution.  (And I see that other IPython users don't seem to encounter it,
> so I thought I'd ask).  I'm on windows 7 64 (32bit python 2.7.6) and the
> following Multiprocessing Pool test returns a timeout error:
> from multiprocessing import Pool
> def f(x):
>     return x*x
> if __name__ == '__main__':
>     pool = Pool(processes=4)              # start 4 worker processes
>     result = pool.apply_async(f, [10])    # evaluate "f(10)" asynchronously
>     print result.get(timeout=1)           # prints "100" unless your
> computer is *very* slow
>     print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"
> All other multiprocessing tests also throw errors (sometimes in console:
> running function f() writes: Attribute error "module" has to attribute 'f')
> Is there a way to use multiprocessing on windows in ipython?
> Thanks
> --
> View this message in context:
> http://python.6.x6.nabble.com/Multiprocessing-Pool-woes-tp5047050.html
> Sent from the IPython - User mailing list archive at Nabble.com.
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20140211/e277c18e/attachment-0001.html 

More information about the IPython-User mailing list