[IPython-User] Engines die during long parallel computation

Jörg Schnitzbauer joschnitzbauer@gmail....
Wed Nov 21 17:37:10 CST 2012


Ok,
I installed Lubuntu on a Virtualbox running exactly the same Notebook.
The problem did not appear on the Linux system, so I guess it is a Windows
issue.

Joerg


On Fri, Nov 16, 2012 at 11:32 PM, Jörg Schnitzbauer <
joschnitzbauer@gmail.com> wrote:

> Hi,
>
> the ipython Notebook is phenomenal, especially with the easy parallel
> computing functions.
> However, it becomes unusable for me, because the engines keep dying during
> long computations with this error:
>
>
> got stale result: 277aae49-9550-4bc1-a003-e7f2d1bdff43
> EngineError(Engine 1 died while running task '277aae49-9550-4bc1-a003-e7f2d1bdff43')
>
> (I pasted the rest of the error message at the end of the mail.)
> It is quite impossible to debug the issue, because it usually happens
> after some hours of operation.
> I am sure that my code does not contain bugs, because it runs correctly
> when I simply do less of the computations, resulting in shorter run-time.
>
> I am using numpy, so I tried to use the solution proposed here:
> http://lists.ipython.scipy.org/pipermail/ipython-user/2012-June/010506.html
> However, it does not fix the issue.
>
> Is there anything known that can cause this failure, and if yes, how it
> can be fixed?
>
> I am running the Notebook server on a Windows Server 2008 RC2 Standard
> with 16 cores. A cluster of 10 engines is being started from the web
> interface.
>
> Any help would be very appreciated!
>
> Thanks for the great work!
> Joerg
>
>
>
>
>
> {'parent_header': {u'date': datetime.datetime(2012, 11, 16, 21, 5, 19, 110000), u'username': u'username', u'session': u'6d49863b-a939-4765-8fda-ea34429da3b3', u'msg_id': u'277aae49-9550-4bc1-a003-e7f2d1bdff43', u'msg_type': u'apply_request'}, 'msg_type': u'apply_reply', 'msg_id': u'c7642642-5070-40f6-bc3a-aae8d2c5019a', 'content': {u'status': u'ok'}, 'header': {u'username': u'username', u'engine': u'8ba6c4f2-d325-460d-9773-dd1a9b979570', u'msg_type': u'apply_reply', u'dependencies_met': True, u'msg_id': u'c7642642-5070-40f6-bc3a-aae8d2c5019a', u'started': datetime.datetime(2012, 11, 16, 21, 5, 19, 150000), u'session': u'8ba6c4f2-d325-460d-9773-dd1a9b979570', u'status': u'ok', u'date': datetime.datetime(2012, 11, 16, 21, 6, 1, 677000)}, 'buffers': ['\x80\x02]q\x01(cIPython.utils.newserialized\nSerializeIt\nq\x02)\x81q\x03}q\x04(U\x04dataq\x05NU\x0etypeDescriptorq\x06U\x06pickleq\x07U\x08metadataq\x08}ubh\x02)\x81q\t}q\n(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x0b}q\x0c(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\r}q\x0e(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x0f}q\x10(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x11}q\x12(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x13}q\x14(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x15}q\x16(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x17}q\x18(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x19}q\x1a(h\x05Nh\x06h\x07h\x08}ube.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.', '\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.']}
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20121121/35e676e7/attachment.html 


More information about the IPython-User mailing list