[IPython-User] Advice re parallelizing many quick calculations

Fernando Perez fperez.net@gmail....
Sun Jul 1 15:32:49 CDT 2012


Hi Gavin,

On Thu, Jun 28, 2012 at 2:59 PM, Junkshops <junkshops@gmail.com> wrote:
> Hi all,
>
> Total noob to the iPython notebook and parallelization here, but so far
> it's incredibly cool. My only minor gripe at this point is that it's
> difficult to transfer notebook contents to other media formats, such as
> this email (unless I'm missing a feature somewhere or there's some
> clever way I haven't discovered). A download as txt or rtf feature would
> be handy.

Yes, that's still kind of rough.  For quick and dirty copy/pasting,
using the 'print view' from the File menu lets you at least copy
across cell boundaries, this is an example where I copied some text
and one cell with input and output from a notebook:

#### begin copy
How to save a Numpy array in R format easily with IPython's R magic

Let's first read a numpy array from disk (previously created):
In [14]:

a = np.load('a.npy')
a
Out[14]:
array([[ 1.53072231, -0.79962718,  1.74566634,  1.74568946],
       [-0.11861236,  0.54944315, -2.29244152,  1.1506366 ],
       [-0.87465535,  0.59942163,  1.30057786, -0.53106596],
       [-2.30120764,  0.07856033, -1.7061942 , -0.14286314]])

#### end copy

The In/Out prompts are in lines by themselves, but at least it works.

> Anyway, I've been reading the docs and playing with parallelization of a
> simple routine, but so far I've only managed to make it much slower, so
> I'm hoping someone might be able to give me some advice. I'm worried
> that perhaps the calculations are so simple, no matter how I parallelize
> it the costs of un/packing and transporting the objects might dwarf the
> calcs.

That's the first consideration.  I suggest you play a little bit with
the latency/throughput measurements explained here:

http://minrk.github.com/scipy-tutorial-2011/performance.html

and that material (in updated form) is available as notebooks you can
download from here:

https://github.com/ipython/ipython-in-depth/tree/master/notebooks

while you can watch their delivery on video (third video in that page):

http://ipython.org/videos.html

As a rule of thumb, a task needs to take O(10-100ms) long to execute
before it's worth parallelizing with IPython, *assuming zero data
transfer cost*.  You'll need to also tack on the data transfer costs
to that before making any final conclusions.

Parallel computing is always a game of tradeoffs between communication
and computation costs, and the latter are doubly important in
non-shared-memory contexts such as IPython.

If you only want to do multicore parallelization and can share your
array data across cores without locking problems (a not completely
uncommon case), you might want to have a look at Cython's parallel
primitives. I haven't used them myself so I can't comment in detail,
but it may be worth a look.

Cheers,

f


More information about the IPython-User mailing list