[IPython-user] task client execution time

Brian Granger ellisonbg.net@gmail....
Fri Aug 21 13:38:54 CDT 2009


On Fri, Aug 21, 2009 at 10:29 AM, Lev Givon <lev@columbia.edu> wrote:

> I was recently trying out the task client parallelization feature in
> ipython 0.10 on Linux. I noticed that the parallel map in the code
> snippet below takes far longer to execute (i.e., several seconds as
> opposed to a few milliseconds) than the serial equivalent when run
> against a local cluster of 4 engines on a quad core machine. Is this
> expected? Should I be using the task client interface in a different
> way?
>

No, you are using this correctly.  But, let's think abit more about the
performance issues of this.  First of all, before starting any
parallelization of anything, you should optimize the serial case.  For the
case of sin (which is obviously a trivial test case), this is simple, but
this shows how I think about these things.

First, I would not use map in the serial case.  Instead, pass data to np.sin
itself:

In [8]: %timeit map(lambda x:sin(x), data)
100 loops, best of 3: 2.66 ms per loop

In [9]: %timeit sin(data)
10000 loops, best of 3: 21.7 us per loop

Notice, just passing data to sin gives a massive speedup.  It would be
possible to speed up the serial version slightly more by writing a Cython
based version.

Once you have a fast serial version, you can start thinking about the
performance issues of a parallel version.  Things to keep in mind:

* Simple operators (like sin), depending on the CPU details are memory bound
(or close to it), not CPU bound.  Thus, on a single machine, whether threads
or processes are used, regardless of implementation language, you won't see
much speedup.

* Any solution based on processes (like IPython) is subject to network
latency times.  When you are running all locally on a multicore workstation
these times are:

ping localhost
round-trip min/avg/max/stddev = 0.058/0.067/0.073/0.006 ms

Immediately you see that on local host your best case network latency is of
the same order (it is actually 2x greater!) as the execution time of the
fast serial version.  Thus, there is *no way* you will ever see parallel
speedup using *any* process+networking based parallel solution.

* For any large amount of data you also have to take into account the
bandwidth of the network connection.  Again, this is not a limitation of
IPython, but rather of the underlying hardware.

The bottom line is that it is extremely difficult to parallelize something
like sin(x) for small arrays - and even for large arrays, it is still very
tough.

What this means in terms of IPython's parallel features is that they are
focused on usage cases where you have 1) a lot of computation to do per
invocation and 2) a small amount of data needed to send.

If you really want to speed up simple operations, I would check out corepy
or using GPUs through CUDA or OpenCL.

Cheers,

Brian



>
>
>
> import numpy as np
> from time import time
>
> from IPython.kernel import client
>
> mec = client.MultiEngineClient()
> tc = client.TaskClient()
>
> mec.execute('import numpy as np')
>
> data = np.arange(500)
>
> start = time()
> result = map(lambda x:np.sin(x), data)
> stop = time()
> print 'serial execution time = %f' % (stop-start)
>
> start = time()
> result = tc.map(lambda x:np.sin(x), data)
> stop = time()
> print 'parallel execution time = %f' % (stop-start)
> _______________________________________________
> IPython-user mailing list
> IPython-user@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20090821/a2612b83/attachment.html 


More information about the IPython-user mailing list