[IPython-dev] Performance sanity check: 7.21s to scatter 5000X1000 float array to 7 engines

Fernando Perez fperez.net@gmail....
Thu Jan 10 21:52:59 CST 2008


Hi Anand,

On Jan 10, 2008 8:28 PM, Anand Patil <anand.prabhakar.patil@gmail.com> wrote:
>
> On Jan 10, 2008 2:25 PM, Anand Patil <anand.prabhakar.patil@gmail.com>
> wrote:
>
> > Hi all,
> >
> > Just trying to figure out whether something is wrong with my installation:
> I'm using mpi4py with mpich, on a machine with two quad-core 3.0Ghz intel
> processors. The following script takes 7.21s (wall time) with seven
> IPEngines, with almost all of the time spent in the scatterAll() operation:
> >
> >
> > from numpy import *
> > from ipython1 import *
> > import ipython1.kernel.api as kernel
> > rc = kernel.RemoteController(('127.0.0.1',10105))
> >
> > C=ones((5000,1000),dtype=float)
> > rc.scatterAll('C',C)
> >
> > rc.resetAll()
> >
> >
> > Is this fairly typical, and if not is there anything I can do to speed up
> the pushing and pulling?
> >
> > Thanks,
> > Anand Patil
> >
>
>
> I got similar results with OpenMPI. Here are the profiler results:
>
>  In [11]: run -p test
>           2809230 function calls in 9.891 CPU seconds
>
>     Ordered by: internal time
>
>     ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>        618    6.075    0.010    6.075    0.010 {method 'recv' of
> '_socket.socket' objects}
>          6    1.484    0.247    1.484    0.247 <string>:1(sendall)
>          1    0.980    0.980    1.855    1.855 base64.py:285(encode)
>     701757    0.371    0.000    0.371    0.000 {binascii.b2a_base64}
>          1    0.216    0.216    0.216    0.216 {cPickle.dumps}
>     701759    0.205    0.000    0.205    0.000 {method 'read' of
> 'cStringIO.StringI' objects}
>         60    0.183    0.003    0.183    0.003 {method 'join' of 'str'
> objects}
>     702494    0.173    0.000    0.173    0.000 {method 'append' of 'list'
> objects}
>     701798    0.125    0.000    0.125    0.000 {len}
>          3    0.040    0.013    2.079    0.693 xmlrpclib.py:1041(dumps)
>          1    0.029    0.029    0.029    0.029 {method 'fill' of
> 'numpy.ndarray' objects}
>          3    0.002    0.001    9.642    3.214 xmlrpclib.py:1427(__request)
>          3    0.002    0.001    9.644    3.215 xmlrpclib.py:1146(__call__)
>          1    0.001    0.001    9.815    9.815
> multienginexmlrpc.py:946(scatterAll)
>         21    0.000    0.000    6.076    0.289 socket.py:321(readline)
>          1    0.000    0.000    9.891    9.891 test.py:2(<module>)
>
>  Looks like it's pickling the array and sending it as a string. I think
> mpi4py can send numerical arrays without pickling, so perhaps I don't have
> it installed properly?

No, it's not an installation problem, it's simply that currently,
ipython doesn't have any special mpi-optimized scatter/gathers.  We've
been talking about it, and it's something that hopefully soon we'll
get to work on, but currently the controller will be a serious
bottleneck with the usage pattern you have (sending largish arrays
from the controller to every engine).

In this usage mode, the engine also bottlenecks because it has to
serially contact all the engines.

With the existing infrastructure, the ideal usage pattern for ipython
is to be sending small messages and having your engines do significant
computation.  But anything that involves sending something big to
everybody will unfortunately be slow right now.

One question: is it possible for you to organize your code runs so
that the engines get their local data via other small parameters?
Sometimes you can instead of generating a large random array and
scattering, seed the local RNGs differently and do the generation
locally (on the engines), or you can have each engine read its data
over a network filesystem, etc.

Another possibility is to start your engine group via mpirun and then
have say engine 0 do an *MPI* scatter of an array, which is then used
by the others.

Let us know if any of this is useful...

Cheers,

f


More information about the IPython-dev mailing list