[IPython-User] Parallel direct view pull and bandwidth
RICHARD Georges-Emmanuel
perspective.electronic@gmail....
Mon Aug 8 03:25:34 CDT 2011
Minrk,
Thanks for the hint,
indeed numpy array did the trick for bandwidth unleashing, I confirm
with a numpy array of 62500 float 64
in case of machine"A" has controller and has 2 ipengines
pull FOO from machine"A" 2 engines 8.27ms (2*500kB/8.27.10e-3 => 120MB/s)
pull FOO from machine"A" 1 engine 7.26ms (500kB/2.7 => 68MB/s)
looks much more better
and with a numpy array of 500000 float 64
pull FOO from machine"A" 2 engines 44 ms (~181 MB/s)
pull FOO from machine"A" 1 engine 23 ms (~173 MB/s)
else I tried to switch to 'pickle' but doesn't change things. I
configure on the command-line:
on a local machine setup with 4 xterm console:
ipcontroller --ip='*' --Session.packer='pickle'
ipengine --Session.packer='pickle'
ipengine --Session.packer='pickle'
ipython --Session.packer='pickle'
then from ipython:
from IPython.parallel import Client
rc = Client(packer='pickle')
dview=rc[:]
dview.execute("FOO=[0.0 for i in xrange(62500)",block=True) # 62500 *
float 64 -> 500kB of data to transfer
[None,None]
T=time.time();tmp=dview.pull('FOO');print time.time() - T # for 2
ipengines
3.4 s
T=time.time();tmp=dview.pull('FOO',0);print time.time() - T # for 1 ipengine
2.5 s
I will continue to play arround, lots of things to learn anyway.
Thanks again,
Joe
On 08/08/2011 11:31, MinRK wrote:
> These are indeed interesting numbers, thanks for running them! The
> one thing to be careful of is that IPython uses JSON to serialize by
> default. If you are on Python 2.6, the stdlib json is *extremely*
> slow. IPython will prefer jsonlib/jsonlib2 or more recent simplejson
> if you have them, all of which are significantly faster. If you are
> concerned with serialization performance, you can specify a different
> serialization scheme, such as cPickle, which you can activate with:
>
> c.Session.packer='pickle' in your config files or on the command-line
> (you will then also have to specify `rc = Client(packer='pickle') when
> you create your Client`
>
> When I am checking performance limits, I tend to use msgpack
> <http://msgpack.org/>, which I enable with:
>
> c.Session.packer='msgpack.packb'
> c.Session.unpacker='msgpack.unpackb'
>
> in my config files.
>
> (again, now you have to do: `rc = Client(packer='msgpack.packb',
> upacker='msgpack.unpackb')` I will get the config properly hooked up
> to the Client soon, so it will inherit correctly from the controller)
>
> Serializing a list of ints is more a test of the message serialization
> scheme than IPython's throughput, because pretty much the whole time
> will be spent making a giant JSON list (pickle should be much faster).
> If you really want to test the raw throughput of IPython+ØMQ, you
> should try sending numpy arrays, which are supported with zero-copy
> sends, that allow us to reach ~Gb limits on unimpressive laptops:
>
> rc = Client()
> dview = rc[:]
> with dview.sync_imports():
> import numpy
> dview.execute("foo=numpy.random.random(62500)", block=True) # 8-byte
> floats
> %time dview.pull('foo', block=True);
>
> -MinRK
>
> On Sun, Aug 7, 2011 at 19:38, RICHARD Georges-Emmanuel
> <perspective.electronic@gmail.com
> <mailto:perspective.electronic@gmail.com>> wrote:
>
> Hi Minrk,
>
> first of all congratulation to all the ipython team for the great work
> you did with the release 0.11, and ZMQ 2.1.7. I'm a fan.
>
> I tried the parallel with direct views, that's great.
>
> with a machine A (192.168.1.4)
> 1) ipcontroller --ip='*'
> from machine"A" I remote start ipengine on machine"B" (192.168.1.200)
> 2) ssh root@192.168.1.200 <mailto:root@192.168.1.200> ipengine
> --file=/sharedMachineAfs/root/.config/ipython/profile_default/security/ipcontroller-engine.json
> &
> (I do the point 2) twice to get 2 ipengines, I also tried in
> local with
> only machine"A")
>
> then I start ipython to start a client, and I want to evaluate the
> bandwith (and latency in a second step).
>
> import time
> from IPython.parallel import Client
> rc = Client()
> dview=rc[:]
> dview.execute("FOO=[0.0 for i in xrange(62500)",block=True) #
> 62500 *
> float 64 -> 500kB of data to transfer
> [None,None]
> T=time.time();tmp=dview.pull('FOO');print time.time() - T # for 2
> ipengines
> T=time.time();tmp=dview.pull('FOO',0);print time.time() - T # for 1
> ipengine
>
> in case of machine"A" as controller and machine"B" as 2 ipengines
> pull FOO from machine"B" 2 engines 9.03 seconds
> (2*500kB/9.03 =>
> 110kB/s) on a network 100Mb/s (12.MB/s)
> pull FOO from machine"B" 1 engine 4.7 seconds (500kB/4.7 =>
> 106kB/s)
>
> in case of machine"A" as controller and as 2 ipengines
> pull FOO from machine"A" 2 engines 3.4 seconds
> (2*500kB/3.4 =>
> 294kB/s) on a local machine
> pull FOO from machine"A" 1 engine 2.7 seconds (500kB/2.7 =>
> 185kB/s)
>
> I guess I'm doing something wrong, or I missuse something. Any hint
> would be appreciate, anyway I will continue to dig in.
>
> Machine"A" and "B" are running under RHEL5 flavoured distro, with
> python
> 2.6, ipython 0.11 installed from source.
> Machine"A" is a Quad core 2.6GHz
> Machine"B" is an AMD64 3000+ 1.8GHz (pretty old but still alive)
>
> cheers.
> Joe
>
>
>
> --
> RICHARD Georges-Emmanuel
> CEO - Electronic and Computer Engineer
> perspective.electronic@gmail.com
> <mailto:perspective.electronic@gmail.com>
> 遠大電子有限公司 (統一編號24470425)
> 手機 +886930319433 <tel:%2B886930319433>
> 電話 +88635735463 <tel:%2B88635735463>
>
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org <mailto:IPython-User@scipy.org>
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
--
RICHARD Georges-Emmanuel
CEO - Electronic and Computer Engineer
perspective.electronic@gmail.com
遠大電子有限公司 (統一編號24470425)
手機 +886930319433
電話 +88635735463
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20110808/9d88e544/attachment.html
More information about the IPython-User
mailing list