[SciPy-User] Multiprocessing and shared memory
Sun Oct 18 13:57:03 CDT 2009
Felix Schlesinger skrev:
> 1. Using multiprocessing.Array and passing it to numpy.frombuffer (see
> This has the disadvantage to messing with the ctypes to numpy
> conversion and generally looks clumsy.
multiprocessing.Array cannot be communicated between processes except at
fork. Passing it though multiprocessing.Queue will fail. You must
preallocate all shared memory in advance of instantiating
> 2. Using numpy.memmap.
> This has the disadvantage that I need to create file descriptors, keep
> track of them and make sure that the are closed at the right moment
numpy.memmap uses BSD memap, not System V IPC. That means the shared
segment has no name, so it must be created in the parent prior to forking.
> (when I tried to get It to work implicitly, I ran into memory leaks, I
> think due to the files not being closed when worker processes
It is probably due to multiprocessing using os._exit instead of
sys.exit. Clean-up code is never executed. You must manually close any
file handle in the worker process. Also, it is pickled by copying the
buffer content, so if you pass it to multiprocessing.Queue, the child
gets a private copy instead of the shared-memory array.
> I read that parts of numpy internaly use multithreading to avoid the
> global interpreter lock. Which parts are that and how is it triggered?
> Specifically is there a way to run numerical expressions on large
> arrays in parallel (each thread working on a part of the array)? I am
> doing things like
> exp(special.gammaln(arr1 * x) - arr2)
There is a multicore branch of numpy. I have never used it.
Intel MKL has multicore support. You can build NumPy against it. At
least LAPACK, BLAS (and possibly FFT) should use multiple cores.
Also note that you can use Cython with normal Python threads, and
release the GIL when working with the ndarrays in Cython. Cython has a
special syntax for numpy arrays. This is what I currently do, and why I
more or less lost my interest in shared memory. The GIL is only a
problem if you don't release it. But in Cython you can just do:
OpenMP is nice if you can put some bottlenecks in C or Fortran (ctypes,
f2py or Cython).
More information about the SciPy-User