[IPython-User] ipython parallel

Wolfgang Kerzendorf wkerzendorf@gmail....
Fri Jun 22 15:10:23 CDT 2012


That helps a lot, yes I didn't properly read through this. 
I had a look but I can't find default files for ipcontroller_config.py or ipcluster_config.py

Thanks for your help,
   Wolfgang
On 2012-06-22, at 3:20 PM, MinRK wrote:

> 
> 
> On Fri, Jun 22, 2012 at 9:03 AM, Wolfgang Kerzendorf <wkerzendorf@gmail.com> wrote:
> Hey guys,
> 
> When I first saw 0.12 I only explored the new notebook feature and that was a stunning experience. I have now started to look into parallel processing with ipython and I am blown away. This is really great work!!! Thanks for sharing this!!
> 
> There are a couple of things that I haven't quite figured out yet (I have read through most of the manual):
> 
> Is there an easy way to start a cluster on a server and easily connect to it from a client (I think that should be an important example in the tutorial that I didn't easily find).
> Running ipcluster start on the server makes an ipcontroller that doesn't seem to listen to the outside. So what I have done so far is just run ipcontroller --ip=<myip> and then ipcluster engine. Then copying the client json file to the client and then connecting. Is there a better way to do this?
> 
> Yes, the docs describe how to put these things in config files. In your ipcontroller_config.py:
> 
> c.HubFactory.ip = '*'
> 
> There is nothing that you can do on the command-line that you cannot do permanently in a config file.
>  
> 
> The question I have is how to push in loadbalancedview (in the documentation it says that this doesn't make sense, but I think for my problem it does).
> 
> The reason pushing on a load-balanced view makes no sense is the *push itself* would be load-balanced.  That is, LBView.push({'a' : a}) will push a to *one* engine, but you have no idea which one.  If you want to push to all engines, then you want to use a DirectView.  This is a standard pattern: use a DirectView to set up data / namespaces, then use a LoadBalancedView to distribute work across the engines.
>  
> 
> I am running many  montecarlo simulations (1000s, each one is one independent job) that each need several 100 mb of exactley the same data. So my idea is to preload this on all the engines and then just access it when I need it. I would like to preload this data on all of the engines, so I can just access it as a global variable. How do I do this?
> 
> 
> Another question is what happens if a view is closed. Do the engines clear out all of the pushed data and go into a pristine state again (that would be desirable, imho).
> 
> Nope, View.close just closes a few sockets, and involves no communication whatsoever.  Implicitly resetting the namespace is probably not a good idea because there can be many other views on those same engines, and they would be surprised if simply removing a view affected the engine state.
> 
> This and the above question suggest a bit of confusion about the engines and their namespaces.  Each engine has *exactly one* namespace.  The different Views are simply different ways of interacting with those namespaces.
> 
> If you want to do *anything* on more than one engine, use a DirectView.  Any individual action you do with LoadBalancedView will happen on exactly one engine (map is multiple actions).  All views are talking to the same namespaces, only talking to them in different ways.
> 
> So I want to run a function a thousand times (f(a,b)) and a=arange(1000) and b is a constant variable for all of these 1000 tasks consisting of a 100mb numpy array.
> 
> # Step 0: create *two* views, one DirectView of all engines, and a load-balanced view
> dv = rc[:]
> view = rc.load_balanced_view() 
> 
> # Step 1: initialize data
> 
> # Depending on where b comes from, so here are two options:
> 
> # load b directly on each engine (preferable, if possible)
> dv.execute("b = my_load_b('/path/to/b_data_file.dat')")
> 
> # or if you only have b locally on the client, you can push it to all engines:
> dv['b'] = b # aka dv.push(dict(b=b))
> 
> # Now that you have your engines set up, and can submit your job:
> 
> a = arange(1000)
> 
> # there is a special wrapper for passing arguments to functions that are already resident on engines:
> from IPython import parallel
> b_ref_list = [parallel.Reference('b')] * len(a)
> 
> amr = lbview.map(f, a, b_ref_list)
> 
>  
> 
> This is a great tool to build a small cloud for research team. I suggest making a little easy config file that contains ip adresses and number of engines to start (and connection method) that would go along with ipcluster. ipcluster would then start up engines on the specified ip addresses and build a cluster. This is just a really minor suggestion.
> 
> The SSH launchers support exactly this, but I think the current implementation of the SSH launchers is still pretty crummy.
> 
> -MinRK
>  
> 
> Cheers
>   Wolfgang
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
> 
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120622/6b5ad763/attachment.html 


More information about the IPython-User mailing list