[IPython-User] Parallel direct view pull and bandwidth
Sun Aug 7 22:31:07 CDT 2011
These are indeed interesting numbers, thanks for running them! The one
thing to be careful of is that IPython uses JSON to serialize by default.
If you are on Python 2.6, the stdlib json is *extremely* slow. IPython
will prefer jsonlib/jsonlib2 or more recent simplejson if you have them, all
of which are significantly faster. If you are concerned with serialization
performance, you can specify a different serialization scheme, such as
cPickle, which you can activate with:
c.Session.packer='pickle' in your config files or on the command-line (you
will then also have to specify `rc = Client(packer='pickle') when you create
When I am checking performance limits, I tend to use
which I enable with:
in my config files.
(again, now you have to do: `rc = Client(packer='msgpack.packb',
upacker='msgpack.unpackb')` I will get the config properly hooked up to the
Client soon, so it will inherit correctly from the controller)
Serializing a list of ints is more a test of the message serialization
scheme than IPython's throughput, because pretty much the whole time will be
spent making a giant JSON list (pickle should be much faster). If you
really want to test the raw throughput of IPython+ØMQ, you should try
sending numpy arrays, which are supported with zero-copy sends, that allow
us to reach ~Gb limits on unimpressive laptops:
rc = Client()
dview = rc[:]
dview.execute("foo=numpy.random.random(62500)", block=True) # 8-byte floats
%time dview.pull('foo', block=True);
On Sun, Aug 7, 2011 at 19:38, RICHARD Georges-Emmanuel <
> Hi Minrk,
> first of all congratulation to all the ipython team for the great work
> you did with the release 0.11, and ZMQ 2.1.7. I'm a fan.
> I tried the parallel with direct views, that's great.
> with a machine A (192.168.1.4)
> 1) ipcontroller --ip='*'
> from machine"A" I remote start ipengine on machine"B" (192.168.1.200)
> 2) ssh email@example.com ipengine
> (I do the point 2) twice to get 2 ipengines, I also tried in local with
> only machine"A")
> then I start ipython to start a client, and I want to evaluate the
> bandwith (and latency in a second step).
> import time
> from IPython.parallel import Client
> rc = Client()
> dview.execute("FOO=[0.0 for i in xrange(62500)",block=True) # 62500 *
> float 64 -> 500kB of data to transfer
> T=time.time();tmp=dview.pull('FOO');print time.time() - T # for 2
> T=time.time();tmp=dview.pull('FOO',0);print time.time() - T # for 1
> in case of machine"A" as controller and machine"B" as 2 ipengines
> pull FOO from machine"B" 2 engines 9.03 seconds (2*500kB/9.03 =>
> 110kB/s) on a network 100Mb/s (12.MB/s)
> pull FOO from machine"B" 1 engine 4.7 seconds (500kB/4.7 =>
> in case of machine"A" as controller and as 2 ipengines
> pull FOO from machine"A" 2 engines 3.4 seconds (2*500kB/3.4 =>
> 294kB/s) on a local machine
> pull FOO from machine"A" 1 engine 2.7 seconds (500kB/2.7 =>
> I guess I'm doing something wrong, or I missuse something. Any hint
> would be appreciate, anyway I will continue to dig in.
> Machine"A" and "B" are running under RHEL5 flavoured distro, with python
> 2.6, ipython 0.11 installed from source.
> Machine"A" is a Quad core 2.6GHz
> Machine"B" is an AMD64 3000+ 1.8GHz (pretty old but still alive)
> RICHARD Georges-Emmanuel
> CEO - Electronic and Computer Engineer
> 遠大電子有限公司 （統一編號24470425）
> 手機 +886930319433
> 電話 +88635735463
> IPython-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User