[SciPy-user] shared memory machines

Gael Varoquaux gael.varoquaux@normalesup....
Wed Feb 11 07:33:06 CST 2009


On Wed, Feb 11, 2009 at 01:31:53PM +0100, Sturla Molden wrote:
>   def __dealloc__(SharedMemoryBuffer self):
>      print 'Calling __dealloc__ on buffer at %s' \
>              % <unsigned long> self.mapped_address #DBG
>      self.handle.dealloc()

> Why do you do this? The Handle should self destruct. Anyway, this is 
> evil and will possibly case multiprocessing to hang, as well as segfaults.

This was for debugging. I do not understand why my test code shows only
one call to __dealloc__ (see below), and I am trying to figure out why. I
fear this has more to do with Python's garbage collector.

I agree this is evil. However, if I don't add this code, the __dealloc__
method of the handler does not seem get called in my example.

Here is what worries me:

I run this test code:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
import ndarray as shmem
import numpy as np

def modify_array(ary):
    ary[:3] = 1
    print  'Array address in sub program %s' % ary.ctypes.data

from multiprocessing import Pool

def main():
    a = shmem.shared_zeros(10)

    p = Pool()

    print 'Array address in main program %s' % a.ctypes.data
    print a

    job = p.apply_async(modify_array, (a, ))
    p.close()
    p.join()

    print a

main()

import gc
gc.collect()
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I get the following output:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Array address in main program 47294723575808
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
Array address in sub program 47294723575808
Calling __dealloc__ on buffer at 47294723575808
Deallocated memory at 47294723575808
[ 1.  1.  1.  0.  0.  0.  0.  0.  0.  0.]
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The two messages about deallocation are debug prints that I inserted in
the two __dealloc__ methods. It seems to me that the array 'a' in the
main program has not been dellocated. I thus believe that there is a
memory leak (I haven't been able to really confirm). It seems to me that
the __dealloc__ method of 'a' does not get called in the main program. I
have also just added print of pid (not in above example), and the two
calls to __dealloc__ do happen in the child process. Finally, if I do not
call explictely __dealloc__ for the handler in the dealloc of the buffer,
I do not see it being called.

So I am wondering if we are not being tricked by the fact that Python
calls the __del__ method lazily, in particular when quitting. Maybe the
solution to this problem is to add an exit hook (seems like that's what
other people did when faced with this problem:
http://www.python.org/search/hypermail/python-recent/0635.html, follow up
is also interresting:
http://www.python.org/search/hypermail/python-recent/0636.html), however
this is not terribly robust. I wonder how mutliprocessing deals with this
problem.

By the way, I have just found a trivial bug: if I call shared_zeros with
1e5 as an argument, the code does not realise it should process this as
an int. I suggest that shared_empty also accepts floats in the 'magic'
cast from numbers to tuple for the shape, as this is what numpy does.

Gaël


More information about the SciPy-user mailing list