[Numpy-discussion] pickling/unpickling numpy.void and numpy.record for multiprocessing

Martin Spacek numpy@mspacek.mm...
Fri Feb 26 16:41:25 CST 2010

I have a 1D structured ndarray with several different fields in the dtype. I'm 
using multiprocessing.Pool.map() to iterate over this structured ndarray, 
passing one entry (of type numpy.void) at a time to the function to be called by 
each process in the pool. After much confusion about why this wasn't working, I 
finally realized that unpickling a previously pickled numpy.void results in 
garbage data. Here's an example:

 >>> import numpy as np
 >>> x = np.zeros((2,), dtype=('i4,f4,a10'))
 >>> x[:] = [(1,2.,'Hello'), (2,3.,"World")]
 >>> x
array([(1, 2.0, 'Hello'), (2, 3.0, 'World')],
       dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])
 >>> x[0]
(1, 2.0, 'Hello')
 >>> type(x[0])
<type 'numpy.void'>
 >>> import pickle
 >>> s = pickle.dumps(x[0])
 >>> newx0 = pickle.loads(s)
 >>> newx0
(30917960, 1.6904535998413144e-38, '\xd0\xef\x1c\x1eZ\x03\x00d')
 >>> s
 >>> type(newx0)
<type 'numpy.void'>
 >>> newx0.dtype
dtype([('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])
 >>> x[0].dtype
dtype([('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])
 >>> np.version.version

This also seems to be the case for recarrays with their numpy.record entries. 
I've tried using pickle and cPickle, with both the oldest and the newest 
pickling protocol. This is in numpy 1.4 on win32 and win64, and numpy 1.3 on 
32-bit linux. I'm using Python 2.6.4 in all cases. I also just tried it on 
Python 2.5.2 with numpy 1.0.4. All have the same result, although the garbage 
data is different each time.

I suppose numpy.void is as it suggests, a pointer to a specific place in memory. 
I'm just surprised that this pointer isn't dereferenced before pickling Or is 
it? I'm not skilled in interpreting the strings returned by pickle.dumps(). I do 
see the word "Hello" in the string, so maybe the problem is during unpickling.

I've tried doing a copy, and even a deepcopy of a structured array numpy.void 
entry, with no luck.

Is this a known limitation? Any suggestions on how I might get around this? 
Pool.map() pickles each numpy.void entry as it iterates over the structured 
array, before sending it to the next available process. My structured array only 
needs to be read from by my multiple processes (one per core), so perhaps 
there's a better way than sending copies of entries. Multithreading (using an 
implementation of a ThreadPool I found somewhere) doesn't work because I'm 
calling scipy.optimize.leastsq, which doesn't seem to release the GIL.



More information about the NumPy-Discussion mailing list