[IPython-User] Using IPython as a Batch Queue

Erik Petigura eptune@gmail....
Mon Jan 23 00:53:49 CST 2012


Dear Wes and Min,

Thanks for the suggestions regarding other programs for managing batch submission.  If it's okay, I'd like to understand a bit more what's going on in the IPython framework.


>>>> 
>>>> Periodically, one of my cores drops out.
>>> 
>>> Can you explain this one? Is there any indication as to why one of
>>> your engines fails?  It's possible this an erroneous heart failure,
>>> which can be alleviated by relaxing the heartbeat period to 5-10
>>> seconds with:
>>> 
>>> c.HeartMonitor.period = 10
>>> 
>>> in your ipcontroller_config.py
>>> 
>>> 


Here is a `ps aux' dump of what's going on.  I cleaned up the paths for readability.



USER       PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
petigura 56013 100.0  0.3  2634324  53768 s003  R+   10:26PM   0:35.62 python val2134.py
petigura 55962  99.0  0.3  2652864  55496 s003  R+   10:26PM   1:13.14 python val2140.py
petigura 56025  99.0  0.3  2635648  53692 s003  R+   10:26PM   0:28.09 python val2139.py
petigura 55812  98.5  0.3  2653816  62736 s003  R+   10:24PM   2:36.85 python val2135.py
petigura 38665  22.6  0.5  2699096  99376 s002  R+   12:17PM  82:11.48 python ipython --pylab
petigura 44579   0.3  0.2  2559724  33472 s003  S+    3:30PM   2:15.77 python ipcluster start --n=8
petigura 44584   0.1  0.3  2643632  61900 s003  S+    3:30PM   1:07.71 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 53491   0.0  0.0  2666688    432 s003  S+    9:17PM   0:00.00 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44596   0.0  0.3  2666688  55640 s003  S+    3:30PM   0:06.63 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44595   0.0  0.3  2665664  55680 s003  S+    3:30PM   0:06.88 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44594   0.0  0.3  2666688  55636 s003  S+    3:30PM   0:07.32 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44593   0.0  0.3  2666688  55676 s003  S+    3:30PM   0:07.19 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44592   0.0  0.3  2664640  55668 s003  S+    3:30PM   0:07.60 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44591   0.0  0.3  2665664  55776 s003  S+    3:30PM   0:07.96 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44590   0.0  0.3  2665664  55680 s003  S+    3:30PM   0:07.72 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44589   0.0  0.3  2664640  55676 s003  S+    3:30PM   0:08.31 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44588   0.0  0.2  2635272  39724 s003  S+    3:30PM   0:25.99 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44587   0.0  0.0  2623100   2844 s003  S+    3:30PM   0:00.01 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44586   0.0  0.0  2623100   2708 s003  S+    3:30PM   0:00.01 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44585   0.0  0.0  2614908   2752 s003  S+    3:30PM   0:00.01 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 56024   0.0  0.0  2435544    808 s003  S+   10:26PM   0:00.01 /bin/sh -c python val2139.py > val2139.log
petigura 56012   0.0  0.0  2435544    808 s003  S+   10:26PM   0:00.01 /bin/sh -c python val2134.py > val2134.log
petigura 55961   0.0  0.0  2435544    808 s003  S+   10:26PM   0:00.01 /bin/sh -c python val2140.py > val2140.log
petigura 55811   0.0  0.0  2435544    808 s003  S+   10:24PM   0:00.01 /bin/sh -c python val2135.py > val2135.log
petigura 53728   0.0  0.0  2666688    428 s003  S+    9:31PM   0:00.00 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 53673   0.0  0.0  2665664    420 s003  S+    9:27PM   0:00.00 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 53670   0.0  0.0  2665664    432 s003  S+    9:27PM   0:00.00 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20

Here are some observations:

1. 8 instances of ipengineapp.py were started when I started my jobs at 3:30pm.  
2. Around 9:30pm, 4 of the cores stopped working and 4 new instances of ipengineapp.py were started.
3. Now only 4 cores were working.  

What exactly does the heartbeat do?  Why would a engine work for many hours before dropping out?

Thanks,

Erik



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120122/4f6813b7/attachment-0002.html 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ps.txt
Url: http://mail.scipy.org/pipermail/ipython-user/attachments/20120122/4f6813b7/attachment-0001.txt 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120122/4f6813b7/attachment-0003.html 


More information about the IPython-User mailing list