[IPython-User] Parallel question: Sending data directly between engines

Olivier Grisel olivier.grisel@ensta....
Wed Jan 25 16:43:11 CST 2012


2012/1/25 Olivier Grisel <olivier.grisel@ensta.org>:
> 2012/1/25 MinRK <benjaminrk@gmail.com>:
>> See this pyzmq example for non-copying sends/recvs of numpy arrays:
>> https://github.com/zeromq/pyzmq/blob/master/examples/serialization/serialsocket.py#L33
>>
>> You will have to write your own serialize/deserialize functions,
>> depending on your data structures, but that example shows that simple
>> numpy arrays are trivial.
>
> It might thus be possible to write a non-memory-copying ZMQ-aware Pickler
> implementation that streams the arrays efficiently yet
> transparently and only copy the python object structure boilerplate as
> a prefix in the packed message. That might require a bit of work to
> get right though.

Another alternative would be to work with memory-mapped numpy arrays
and make ZMQ leverage the `sendfile` [1] system call to let the kernel
handle the buffering over the network, filesystem and memory all by
itself, transparently for the python code running on both ends of the
wire.

AFAIK sendfile is platform specific and ZMQ does not leverage it so it
will not be a general purpose solution but it might make sense in a
HPC setting where peer platforms are homogeneous.

[1] http://linux.die.net/man/2/sendfile

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel


More information about the IPython-User mailing list