[IPython-User] Parallel question: Sending data directly between engines

Gael Varoquaux gael.varoquaux@normalesup....
Wed Jan 25 16:24:36 CST 2012

On Wed, Jan 25, 2012 at 03:57:34PM -0600, Matthew Rocklin wrote:
> I see in PR 1295 you're pickling all transmitted objects. Is there a
> cleaner/faster way of doing this for arrays?
> Using cPickle on a 1000 by 1000 array of float32s takes about a second on
> my machine.

Indeed, cPickle is surprisingly slow:

    In [2]: a = np.random.random(size=(1000, 1000)).astype(np.float32)

    In [3]: %timeit cPickle.dumps(a)
    1 loops, best of 3: 672 ms per loop

This is nowhere close to the limits that we can think of, as the reduce
of the array is costless:

    In [4]: %timeit a.__reduce__
    10000000 loops, best of 3: 126 ns per loop

Also, as suggested by Olivier, using joblib, which is a bit more clever
than pickle, althought it writes to files:

    In [6]: from joblib import dump

    In [7]: %timeit dump(a, 'tmp.pkl')
    10 loops, best of 3: 138 ms per loop

Even more surprising, joblib is still more efficient with compression:

    In [8]: %timeit dump(a, 'tmp.pkl', compress=1)
    1 loops, best of 3: 489 ms per loop

Note that here the compression is inefficient, as the array is of very
high entropy. With a low entropy array:

    In [9]: a[...] = 0

    In [10]: %timeit dump(a, 'tmp.pkl', compress=1)
    10 loops, best of 3: 50.9 ms per loop

If people are interested, we could integrate a patch that makes joblib
work in memory. Of course, the danger is to blow the memory when working
with big data: the data will inevitably be partially duplicated during
the serialization.


More information about the IPython-User mailing list