[IPython-User] ipcluster runs out of memory and can't purge results

Johann Rohwer jr@sun.ac...
Fri Dec 14 06:05:59 CST 2012


To follow up on my own post, I have run the script below for a single 
chunk (600 parameter sets). The ipython process then consumes about 
2.5Gb of memory. I have then deleted the rc, lv, arl, ds and hdf 
objects in the ipython console where I ran the script. I have also 
imported gc and ran gc.collect(). However, the ipython process still 
uses 2.5Gb of memory. Only when I quit the ipython session is the 
memory freed.

(Note that the ipcontroller and ipengine processes are behaving fine.)

I am really puzzled by this...
--Johann

On Friday 14 December 2012 10:49:16 Johann Rohwer wrote:
> I'm running a very large computation on a 110-node ipcluster with
> shared home directory and where the enginges are launched with ssh.
> Basically it's a total of 2349 runs which each produce a (512,50,23)
> dataset, so the total resulting data will be on the order of 6Gb.
> The problem I'm experiencing is that the ipython process that runs
> the client just keeps increasing in memory and in the end kills the
> machine (the machine running the controller as well as the client
> has about 5Gb RAM including swap).
> 
> To get around this I've chuncked the computation so that the memory
> copes with one of the chunks. The problem is that the memory does
> not appear to be freed when rc.purge_results('all') is called (see
> below) and just keeps increasing in size. I'm using HDF5 for data
> storage. I've also tried playing around with simple jobs just
> calling one function and then running
> rc.purge_results('msg_id_of_async_result'), but the result is still
> listed in rc.results and does not seem to be cleared. What am I
> doing wrong? Should I perhaps be deleting the rc object after each
> chunk and re-instantiating it?
> 
> The essential points of the code producing the problem are included
> below.
> 
> --Johann
> 
> --------------------------- 8-< ------------------------------------
> from h5py import File
> from IPython.parallel import Client
> 
> def Analyze(p):
>     <some code>
>     return res # numpy array of shape (512,50,23)
> 
> hdf = File('mod_res.hdf','w')
> s = (2349,512,50,23)
> ds = hdf.require_dataset('results',shape=s,dtype='f')
> hdf.flush()
> 
> rc = Client()
> lv = rc.load_balanced_view()
> 
> # pall is an input array of parameters for Analyze, length 2349
> chunk = 600             # chunk size for splitting up the
> computation n = len(pall)/chunk + 1   # number of runs
> 
> for j in range(n):
>     arl = []  # asynchronous results list
>     ps = pall[j*chunk:(j+1)*chunk].copy()
> 
>     for i in range(len(ps)):
>         arl.append(lv.apply(Analyze, ps[i]))
>     lv.wait()
> 
>     for i in range(len(arl)):
>         ar = arl[i].get()
>         ds[j*chunk+i] = ar
>         hdf.flush()
> 
>     rc.purge_results('all')
> 
> hdf.close()
> 


E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lês aangeheg by hierdie e-pos nie.

E-mail disclaimer

This e-mail may contain confidential information and may be legally privileged and is intended only for the person to whom it is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy this document in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail. The University does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any files attached to this e-mail.


More information about the IPython-User mailing list