[SciPy-user] Parallel processing with Python

Sturla Molden sturla@molden...
Wed Feb 18 17:37:32 CST 2009

I know this is not directly related to SciPy, but it may be of interest to
some subscribers to this list.

About a year ago, I posted a scheme to comp.lang.python describing how to
use isolated interpreters and threads to circumvent the GIL on SMPs:


One interpreter per thread is how tcl work. Erlang also uses isolated
threads that only communicate through messages (as opposed to shared
objects). "Appdomains" are also available in the .NET framework, and in
Java as "Java isolates". They are potentially very useful as multicore
CPUs become abundant. They allow one process to run one independent Python
interpreter on each available CPU core.

In Python, "appdomains" can be created by embedding the Python interpreter
multiple times in a process, and associating each interpreter with a
thread. For this to work, we have to make multiple copies of the Python
DLL and rename them (e.g. Python25-0.dll,  Python25-1.dll, 
Python25-2.dll, etc.) Otherwise the dynamic loader will just return a
handle to the already imported DLL. As DLLs can be accessed with ctypes,
we don't even have to program a line of C to do this. we can start up a
Python interpreter and use ctypes to embed more interpreters
into it, associating each interpreter with its own thread. ctypes takes
care of releasing the GIL in the parent interpreter, so calls to these
sub-interpreters become asynchronous. I had a mock-up of this scheme
working. Martin Löwis replied he doubted this would work, and pointed out
that Python extension libraries (.pyd files) are DLLs as well. They would
only be imported once, and their global states would thus crash, thus
producing havoc:


He was right, of course, but also wrong. In fact I had already proven him
wrong by importing a DLL multiple times. If it can be done for
Python25.dll, it can be done for any other DLL as well - including .pyd
files - in exactly the same way. Thus what remains is to change Python's
dynamic loader to use the same "copy and import" scheme. This can either
be done by changing Python's C code, or (at least on Windows) to redirect
the LoadLibrary API call from kernel32.dll to a custom DLL. Both a quite
easy and requires minimal C coding.

Thus it is quite easy to make multiple, independent Python interpreters
live isolated lives in the same process. As opposed to multiple processes,
they can communicate without involving any IPC. It would also be possible
to design proxy objects allowing one interpreter access to an object in
another. Immutable object such as strings would be particularly easy to

This very simple scheme should allow parallel processing with Python
similar to how it's done in Erlang, without the GIL getting in our way. At
least on Windows this can be done without touching the CPython source at
all. I am not sure about Linux though. I may be necessary to patch the
CPython source to make it work there.

Sturla Molden

More information about the SciPy-user mailing list