[IPython-User] IPython cluster: stopping

MinRK benjaminrk@gmail....
Fri Jan 27 14:34:11 CST 2012


On Fri, Jan 27, 2012 at 11:24, Ariel Rokem <arokem@gmail.com> wrote:
> Hi everyone,
>
> I am using ipcluster (from a rather recent github master) to run some
> resampling/bootstrapping analysis of a rather large MRI dataset. For now, I
> am running this locally on an eight core machine (on Fedora). I start by
> calling ipcluster start. Everything fires up OK ("Engines appear to have
> started successfully") and things seem to be going fine.
>
> To do the calculations I call something like the following sequence:
>
> rc = p.Client()
> rc[:].execute('import numpy as np')
> ... # A few more imports of my own analysis modules
>
> dview = rc[:]
>
> kappa = []
>
> for i in n: # n= [8,16,32,64,128]
>     kappa.append(calc_boot(booter, data, n, params, dview))
>
> Where n  is a resampling parameter and the function calc_boot is a wrapper
> to the computation. which does some allocation of variables and
> reorganization of the outputs and includes the line:
>
> ...
>
> this_kappa = np.zeros(kappa_size)
> m = 0
> while m<B: # B is one of the parameters, how many boot-samples to run
>     this = dview.apply_async(booter, data, n, params).get()
>     this_kappa += this
>     m += len(this)
> ...
>
> this_kappa/=m
> return this_kappa
>
> And booter is the function that does some fitting on the data and calculates
> the specific variable kappa, which then gets averaged into the return
> variable this_kappa etc. That's the lengthy computation itself on the data.
> This seems to work great (and fast!), for a while. Monitoring my system, I
> can see that all eight cpus are running at full throttle. Then, after about
> half an hour of running, I get a message that IPython cluster is stopping
> the engines. Once that happens, everything grinds to a halt.

Can you post the entire message when the cluster is stopping? Or
better yet the entire output of ipcluster adding `--debug`?

This usually means that ipcluster is being interrupted, but it could
also mean that the controller is crashing for some reason.

>
> I don't know if this is relevant, but I noticed that while I was running my
> analysis, memory sky-rockets, even though kappa is not such a huge variable
> and is only a derived measure from the data. When the IP cluster stops,
> memory usage goes back down as well.
>
> Any ideas on how to keep my cluster going?
>
> Thanks!
>
> Ariel
>
>
>
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>


More information about the IPython-User mailing list