[Numpy-discussion] pickling/unpickling numpy.void and numpy.record for multiprocessing
Robert Kern
robert.kern@gmail....
Fri Feb 26 17:02:36 CST 2010
On Fri, Feb 26, 2010 at 16:41, Martin Spacek <numpy@mspacek.mm.st> wrote:
> I have a 1D structured ndarray with several different fields in the dtype. I'm
> using multiprocessing.Pool.map() to iterate over this structured ndarray,
> passing one entry (of type numpy.void) at a time to the function to be called by
> each process in the pool. After much confusion about why this wasn't working, I
> finally realized that unpickling a previously pickled numpy.void results in
> garbage data. Here's an example:
>
> >>> import numpy as np
> >>> x = np.zeros((2,), dtype=('i4,f4,a10'))
> >>> x[:] = [(1,2.,'Hello'), (2,3.,"World")]
> >>> x
> array([(1, 2.0, 'Hello'), (2, 3.0, 'World')],
> dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])
> >>> x[0]
> (1, 2.0, 'Hello')
> >>> type(x[0])
> <type 'numpy.void'>
> >>> import pickle
> >>> s = pickle.dumps(x[0])
> >>> newx0 = pickle.loads(s)
> >>> newx0
> (30917960, 1.6904535998413144e-38, '\xd0\xef\x1c\x1eZ\x03\x00d')
> >>> s
> "cnumpy.core.multiarray\nscalar\np0\n(cnumpy\ndtype\np1\n(S'V18'\np2\nI0\nI1\ntp3\nRp4\n(I4\nS'|'\np5\nN(S'f0'\np6\nS'f1'\np7\nS'f2'\np8\ntp9\n(dp10\ng6\n(g1\n(S'i4'\np11\nI0\nI1\ntp12\nRp13\n(I4\nS'<'\np14\nNNNI-1\nI-1\nI0\nNtp15\nbI0\ntp16\nsg7\n(g1\n(S'f4'\np17\nI0\nI1\ntp18\nRp19\n(I4\nS'<'\np20\nNNNI-1\nI-1\nI0\nNtp21\nbI4\ntp22\nsg8\n(g1\n(S'S10'\np23\nI0\nI1\ntp24\nRp25\n(I4\nS'|'\np26\nNNNI10\nI1\nI0\nNtp27\nbI8\ntp28\nsI18\nI1\nI0\nNtp29\nbS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00@Hello\\x00\\x00\\x00\\x00\\x00'\np30\ntp31\nRp32\n."
> >>> type(newx0)
> <type 'numpy.void'>
> >>> newx0.dtype
> dtype([('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])
> >>> x[0].dtype
> dtype([('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])
> >>> np.version.version
> '1.4.0'
>
> This also seems to be the case for recarrays with their numpy.record entries.
> I've tried using pickle and cPickle, with both the oldest and the newest
> pickling protocol. This is in numpy 1.4 on win32 and win64, and numpy 1.3 on
> 32-bit linux. I'm using Python 2.6.4 in all cases. I also just tried it on
> Python 2.5.2 with numpy 1.0.4. All have the same result, although the garbage
> data is different each time.
>
> I suppose numpy.void is as it suggests, a pointer to a specific place in memory.
No, it isn't. It's just a base dtype for all of the ad-hoc dtypes that
are created, for example, for record arrays.
> I'm just surprised that this pointer isn't dereferenced before pickling Or is
> it? I'm not skilled in interpreting the strings returned by pickle.dumps(). I do
> see the word "Hello" in the string, so maybe the problem is during unpickling.
Use pickletools.dis() on the string. It helps to understand what is
going on. The data string is definitely correct:
In [25]: t = '\x01\x00\x00\x00\x00\x00\x00@Hello\x00\x00\x00\x00\x00'
In [29]: np.fromstring(t, x.dtype)
Out[29]:
array([(1, 2.0, 'Hello')],
dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '|S10')])
The implementation of numpy.core.multiarray.scalar is doing something wrong.
> I've tried doing a copy, and even a deepcopy of a structured array numpy.void
> entry, with no luck.
>
> Is this a known limitation?
Nope. New bug! Thanks!
> Any suggestions on how I might get around this?
> Pool.map() pickles each numpy.void entry as it iterates over the structured
> array, before sending it to the next available process. My structured array only
> needs to be read from by my multiple processes (one per core), so perhaps
> there's a better way than sending copies of entries. Multithreading (using an
> implementation of a ThreadPool I found somewhere) doesn't work because I'm
> calling scipy.optimize.leastsq, which doesn't seem to release the GIL.
Pickling of complete arrays works. A quick workaround would be to send
rank-0 scalars:
Pool.map(map(np.asarray, x))
Or just tuples:
Pool.map(map(tuple, x))
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list