[IPython-User] ipcluster runs out of memory and can't purge results

Johann Rohwer jr@sun.ac...
Fri Dec 14 02:49:16 CST 2012


I'm running a very large computation on a 110-node ipcluster with 
shared home directory and where the enginges are launched with ssh. 
Basically it's a total of 2349 runs which each produce a (512,50,23) 
dataset, so the total resulting data will be on the order of 6Gb. The 
problem I'm experiencing is that the ipython process that runs the 
client just keeps increasing in memory and in the end kills the 
machine (the machine running the controller as well as the client has 
about 5Gb RAM including swap).

To get around this I've chuncked the computation so that the memory 
copes with one of the chunks. The problem is that the memory does not 
appear to be freed when rc.purge_results('all') is called (see below) 
and just keeps increasing in size. I'm using HDF5 for data storage.  
I've also tried playing around with simple jobs just calling one 
function and then running rc.purge_results('msg_id_of_async_result'), 
but the result is still listed in rc.results and does not seem to be 
cleared. What am I doing wrong? Should I perhaps be deleting the rc 
object after each chunk and re-instantiating it?

The essential points of the code producing the problem are included 
below.

--Johann

--------------------------- 8-< ------------------------------------
from h5py import File
from IPython.parallel import Client

def Analyze(p):
    <some code>
    return res # numpy array of shape (512,50,23)

hdf = File('mod_res.hdf','w')
s = (2349,512,50,23)
ds = hdf.require_dataset('results',shape=s,dtype='f')
hdf.flush()

rc = Client()
lv = rc.load_balanced_view()

# pall is an input array of parameters for Analyze, length 2349
chunk = 600             # chunk size for splitting up the computation
n = len(pall)/chunk + 1   # number of runs

for j in range(n):
    arl = []  # asynchronous results list
    ps = pall[j*chunk:(j+1)*chunk].copy()

    for i in range(len(ps)):
        arl.append(lv.apply(Analyze, ps[i]))
    lv.wait()

    for i in range(len(arl)):
        ar = arl[i].get()
        ds[j*chunk+i] = ar
        hdf.flush()

    rc.purge_results('all')

hdf.close()

E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige lês aangeheg by hierdie e-pos nie.

E-mail disclaimer

This e-mail may contain confidential information and may be legally privileged and is intended only for the person to whom it is addressed. If you are not the intended recipient, you are notified that you may not use, distribute or copy this document in any manner whatsoever. Kindly also notify the sender immediately by telephone, and delete the e-mail. The University does not accept liability for any damage, loss or expense arising from this e-mail and/or accessing any files attached to this e-mail.


More information about the IPython-User mailing list