[IPython-User] Engines die during long parallel computation

Jörg Schnitzbauer joschnitzbauer@gmail....
Sat Nov 17 01:32:20 CST 2012


Hi,

the ipython Notebook is phenomenal, especially with the easy parallel
computing functions.
However, it becomes unusable for me, because the engines keep dying during
long computations with this error:


got stale result: 277aae49-9550-4bc1-a003-e7f2d1bdff43
EngineError(Engine 1 died while running task
'277aae49-9550-4bc1-a003-e7f2d1bdff43')

(I pasted the rest of the error message at the end of the mail.)
It is quite impossible to debug the issue, because it usually happens after
some hours of operation.
I am sure that my code does not contain bugs, because it runs correctly
when I simply do less of the computations, resulting in shorter run-time.

I am using numpy, so I tried to use the solution proposed here:
http://lists.ipython.scipy.org/pipermail/ipython-user/2012-June/010506.html
However, it does not fix the issue.

Is there anything known that can cause this failure, and if yes, how it can
be fixed?

I am running the Notebook server on a Windows Server 2008 RC2 Standard with
16 cores. A cluster of 10 engines is being started from the web interface.

Any help would be very appreciated!

Thanks for the great work!
Joerg





{'parent_header': {u'date': datetime.datetime(2012, 11, 16, 21, 5, 19,
110000), u'username': u'username', u'session':
u'6d49863b-a939-4765-8fda-ea34429da3b3', u'msg_id':
u'277aae49-9550-4bc1-a003-e7f2d1bdff43', u'msg_type':
u'apply_request'}, 'msg_type': u'apply_reply', 'msg_id':
u'c7642642-5070-40f6-bc3a-aae8d2c5019a', 'content': {u'status':
u'ok'}, 'header': {u'username': u'username', u'engine':
u'8ba6c4f2-d325-460d-9773-dd1a9b979570', u'msg_type': u'apply_reply',
u'dependencies_met': True, u'msg_id':
u'c7642642-5070-40f6-bc3a-aae8d2c5019a', u'started':
datetime.datetime(2012, 11, 16, 21, 5, 19, 150000), u'session':
u'8ba6c4f2-d325-460d-9773-dd1a9b979570', u'status': u'ok', u'date':
datetime.datetime(2012, 11, 16, 21, 6, 1, 677000)}, 'buffers':
['\x80\x02]q\x01(cIPython.utils.newserialized\nSerializeIt\nq\x02)\x81q\x03}q\x04(U\x04dataq\x05NU\x0etypeDescriptorq\x06U\x06pickleq\x07U\x08metadataq\x08}ubh\x02)\x81q\t}q\n(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x0b}q\x0c(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\r}q\x0e(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x0f}q\x10(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x11}q\x12(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x13}q\x14(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x15}q\x16(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x17}q\x18(h\x05Nh\x06h\x07h\x08}ubh\x02)\x81q\x19}q\x1a(h\x05Nh\x06h\x07h\x08}ube.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.',
'\x80\x02cnumpy.core.multiarray\nscalar\nq\x01cnumpy\ndtype\nq\x02U\x02f8K\x00K\x01\x87Rq\x03(K\x03U\x01<NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tbU\x08\xbd+\xd2\x13\xe6,\x93\xbf\x86Rq\x04.']}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20121116/bceb7517/attachment.html 


More information about the IPython-User mailing list