[IPython-User] ipython parallel

MinRK benjaminrk@gmail....
Fri Jun 22 14:20:20 CDT 2012

On Fri, Jun 22, 2012 at 9:03 AM, Wolfgang Kerzendorf

> Hey guys,
> When I first saw 0.12 I only explored the new notebook feature and that
> was a stunning experience. I have now started to look into parallel
> processing with ipython and I am blown away. This is really great work!!!
> Thanks for sharing this!!
> There are a couple of things that I haven't quite figured out yet (I have
> read through most of the manual):
> Is there an easy way to start a cluster on a server and easily connect to
> it from a client (I think that should be an important example in the
> tutorial that I didn't easily find).
> Running ipcluster start on the server makes an ipcontroller that doesn't
> seem to listen to the outside. So what I have done so far is just run
> ipcontroller --ip=<myip> and then ipcluster engine. Then copying the client
> json file to the client and then connecting. Is there a better way to do
> this?

Yes, the docs<http://ipython.org/ipython-doc/dev/parallel/parallel_process.html#general-considerations>
how to put these things in config files. In your ipcontroller_config.py:

c.HubFactory.ip = '*'

There is nothing that you can do on the command-line that you cannot do
permanently in a config file.

> The question I have is how to push in loadbalancedview (in the
> documentation it says that this doesn't make sense, but I think for my
> problem it does).

The reason pushing on a load-balanced view makes no sense is the *push
itself* would be load-balanced.  That is, LBView.push({'a' : a}) will push
a to *one* engine, but you have no idea which one.  If you want to push to
all engines, then you want to use a DirectView.  This is a standard
pattern: use a DirectView to set up data / namespaces, then use a
LoadBalancedView to distribute work across the engines.

I am running many  montecarlo simulations (1000s, each one is one
> independent job) that each need several 100 mb of exactley the same data.
> So my idea is to preload this on all the engines and then just access it
> when I need it. I would like to preload this data on all of the engines, so
> I can just access it as a global variable. How do I do this?

> Another question is what happens if a view is closed. Do the engines clear
> out all of the pushed data and go into a pristine state again (that would
> be desirable, imho).

Nope, View.close just closes a few sockets, and involves no communication
whatsoever.  Implicitly resetting the namespace is probably not a good idea
because there can be many other views on those same engines, and they would
be surprised if simply removing a view affected the engine state.

This and the above question suggest a bit of confusion about the engines
and their namespaces.  Each engine has *exactly one* namespace.  The
different Views are simply different ways of interacting with those

If you want to do *anything* on more than one engine, use a DirectView.
 Any individual action you do with LoadBalancedView will happen on exactly
one engine (map is multiple actions).  All views are talking to the same
namespaces, only talking to them in different ways.

So I want to run a function a thousand times (f(a,b)) and a=arange(1000)
> and b is a constant variable for all of these 1000 tasks consisting of a
> 100mb numpy array.

# Step 0: create *two* views, one DirectView of all engines, and a
load-balanced view
dv = rc[:]
view = rc.load_balanced_view()

# Step 1: initialize data

# Depending on where b comes from, so here are two options:

# load b directly on each engine (preferable, if possible)
dv.execute("b = my_load_b('/path/to/b_data_file.dat')")

# or if you only have b locally on the client, you can push it to all
dv['b'] = b # aka dv.push(dict(b=b))

# Now that you have your engines set up, and can submit your job:

a = arange(1000)

# there is a special wrapper for passing arguments to functions that are
already resident on engines:
from IPython import parallel
b_ref_list = [parallel.Reference('b')] * len(a)

amr = lbview.map(f, a, b_ref_list)

> This is a great tool to build a small cloud for research team. I suggest
> making a little easy config file that contains ip adresses and number of
> engines to start (and connection method) that would go along with
> ipcluster. ipcluster would then start up engines on the specified ip
> addresses and build a cluster. This is just a really minor suggestion.

The SSH launchers support exactly
but I think the current implementation of the SSH launchers is still pretty


> Cheers
>   Wolfgang
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120622/d5319cc1/attachment-0003.html 

More information about the IPython-User mailing list