[SciPy-user] Parallel processing with Python

Andrew Straw strawman@astraw....
Thu Feb 19 00:45:10 CST 2009

Hi Sturla, I think this is a very interesting idea. I once ran into 
weird and mysterious issues with dlopen() when trying to do similar on 
linux but I hadn't thought of the rename-the-shared-library trick. If we 
invented some kind of syntax for software transactional memory (STM), we 
might really be playing with fire. I might give this a try on a couple 
problems I'm working on (which generally have more to do with making 
complicated stuff happen quickly -- with low latency -- than crunching 
tons of data). Anyhow, please keep us informed of further progress!


Sturla Molden wrote:
> I know this is not directly related to SciPy, but it may be of interest to
> some subscribers to this list.
> About a year ago, I posted a scheme to comp.lang.python describing how to
> use isolated interpreters and threads to circumvent the GIL on SMPs:
> http://groups.google.no/group/comp.lang.python/msg/0351c532aad97c5e?hl=no&dmode=source
> One interpreter per thread is how tcl work. Erlang also uses isolated
> threads that only communicate through messages (as opposed to shared
> objects). "Appdomains" are also available in the .NET framework, and in
> Java as "Java isolates". They are potentially very useful as multicore
> CPUs become abundant. They allow one process to run one independent Python
> interpreter on each available CPU core.
> In Python, "appdomains" can be created by embedding the Python interpreter
> multiple times in a process, and associating each interpreter with a
> thread. For this to work, we have to make multiple copies of the Python
> DLL and rename them (e.g. Python25-0.dll,  Python25-1.dll, 
> Python25-2.dll, etc.) Otherwise the dynamic loader will just return a
> handle to the already imported DLL. As DLLs can be accessed with ctypes,
> we don't even have to program a line of C to do this. we can start up a
> Python interpreter and use ctypes to embed more interpreters
> into it, associating each interpreter with its own thread. ctypes takes
> care of releasing the GIL in the parent interpreter, so calls to these
> sub-interpreters become asynchronous. I had a mock-up of this scheme
> working. Martin Löwis replied he doubted this would work, and pointed out
> that Python extension libraries (.pyd files) are DLLs as well. They would
> only be imported once, and their global states would thus crash, thus
> producing havoc:
> http://groups.google.no/group/comp.lang.python/msg/0a7a22910c1d5bf5?hl=no&dmode=source
> He was right, of course, but also wrong. In fact I had already proven him
> wrong by importing a DLL multiple times. If it can be done for
> Python25.dll, it can be done for any other DLL as well - including .pyd
> files - in exactly the same way. Thus what remains is to change Python's
> dynamic loader to use the same "copy and import" scheme. This can either
> be done by changing Python's C code, or (at least on Windows) to redirect
> the LoadLibrary API call from kernel32.dll to a custom DLL. Both a quite
> easy and requires minimal C coding.
> Thus it is quite easy to make multiple, independent Python interpreters
> live isolated lives in the same process. As opposed to multiple processes,
> they can communicate without involving any IPC. It would also be possible
> to design proxy objects allowing one interpreter access to an object in
> another. Immutable object such as strings would be particularly easy to
> share.
> This very simple scheme should allow parallel processing with Python
> similar to how it's done in Erlang, without the GIL getting in our way. At
> least on Windows this can be done without touching the CPython source at
> all. I am not sure about Linux though. I may be necessary to patch the
> CPython source to make it work there.
> Sturla Molden
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user

More information about the SciPy-user mailing list