[IPython-User] Using IPython as a Batch Queue
Erik Petigura
eptune@gmail....
Mon Jan 23 00:53:49 CST 2012
Dear Wes and Min,
Thanks for the suggestions regarding other programs for managing batch submission. If it's okay, I'd like to understand a bit more what's going on in the IPython framework.
>>>>
>>>> Periodically, one of my cores drops out.
>>>
>>> Can you explain this one? Is there any indication as to why one of
>>> your engines fails? It's possible this an erroneous heart failure,
>>> which can be alleviated by relaxing the heartbeat period to 5-10
>>> seconds with:
>>>
>>> c.HeartMonitor.period = 10
>>>
>>> in your ipcontroller_config.py
>>>
>>>
Here is a `ps aux' dump of what's going on. I cleaned up the paths for readability.
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
petigura 56013 100.0 0.3 2634324 53768 s003 R+ 10:26PM 0:35.62 python val2134.py
petigura 55962 99.0 0.3 2652864 55496 s003 R+ 10:26PM 1:13.14 python val2140.py
petigura 56025 99.0 0.3 2635648 53692 s003 R+ 10:26PM 0:28.09 python val2139.py
petigura 55812 98.5 0.3 2653816 62736 s003 R+ 10:24PM 2:36.85 python val2135.py
petigura 38665 22.6 0.5 2699096 99376 s002 R+ 12:17PM 82:11.48 python ipython --pylab
petigura 44579 0.3 0.2 2559724 33472 s003 S+ 3:30PM 2:15.77 python ipcluster start --n=8
petigura 44584 0.1 0.3 2643632 61900 s003 S+ 3:30PM 1:07.71 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 53491 0.0 0.0 2666688 432 s003 S+ 9:17PM 0:00.00 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44596 0.0 0.3 2666688 55640 s003 S+ 3:30PM 0:06.63 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44595 0.0 0.3 2665664 55680 s003 S+ 3:30PM 0:06.88 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44594 0.0 0.3 2666688 55636 s003 S+ 3:30PM 0:07.32 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44593 0.0 0.3 2666688 55676 s003 S+ 3:30PM 0:07.19 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44592 0.0 0.3 2664640 55668 s003 S+ 3:30PM 0:07.60 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44591 0.0 0.3 2665664 55776 s003 S+ 3:30PM 0:07.96 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44590 0.0 0.3 2665664 55680 s003 S+ 3:30PM 0:07.72 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44589 0.0 0.3 2664640 55676 s003 S+ 3:30PM 0:08.31 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44588 0.0 0.2 2635272 39724 s003 S+ 3:30PM 0:25.99 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44587 0.0 0.0 2623100 2844 s003 S+ 3:30PM 0:00.01 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44586 0.0 0.0 2623100 2708 s003 S+ 3:30PM 0:00.01 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 44585 0.0 0.0 2614908 2752 s003 S+ 3:30PM 0:00.01 python ipcontrollerapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 56024 0.0 0.0 2435544 808 s003 S+ 10:26PM 0:00.01 /bin/sh -c python val2139.py > val2139.log
petigura 56012 0.0 0.0 2435544 808 s003 S+ 10:26PM 0:00.01 /bin/sh -c python val2134.py > val2134.log
petigura 55961 0.0 0.0 2435544 808 s003 S+ 10:26PM 0:00.01 /bin/sh -c python val2140.py > val2140.log
petigura 55811 0.0 0.0 2435544 808 s003 S+ 10:24PM 0:00.01 /bin/sh -c python val2135.py > val2135.log
petigura 53728 0.0 0.0 2666688 428 s003 S+ 9:31PM 0:00.00 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 53673 0.0 0.0 2665664 420 s003 S+ 9:27PM 0:00.00 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
petigura 53670 0.0 0.0 2665664 432 s003 S+ 9:27PM 0:00.00 python ipengineapp.py --profile-dir /Users/petigura/.ipython/profile_default --log-to-file --log-level=20
Here are some observations:
1. 8 instances of ipengineapp.py were started when I started my jobs at 3:30pm.
2. Around 9:30pm, 4 of the cores stopped working and 4 new instances of ipengineapp.py were started.
3. Now only 4 cores were working.
What exactly does the heartbeat do? Why would a engine work for many hours before dropping out?
Thanks,
Erik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120122/4f6813b7/attachment-0002.html
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ps.txt
Url: http://mail.scipy.org/pipermail/ipython-user/attachments/20120122/4f6813b7/attachment-0001.txt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120122/4f6813b7/attachment-0003.html
More information about the IPython-User
mailing list