I'm running multiple trials of the same experiment in a for loop.<div><br></div><div> for i in range(10):</div><div> run_experiment()</div><div><br></div><div>It behaves properly for the first several trials. Then it fails with the error (this error goes to the controller's standard error)</div>
<div><br></div><div><div> MemoryError</div><div> FATAL ERROR: OUT OF MEMORY (epoll.cpp:57)</div></div><div><br></div><div>I've read this thread <<a href="http://mail.scipy.org/pipermail/ipython-user/2012-March/009687.html">http://mail.scipy.org/pipermail/ipython-user/2012-March/009687.html</a>>, and so I am already clearing the caches between trials with this subroutine</div>
<div><br></div><div><div> def clear_cache(rc, dview):</div><div> rc.results.clear()</div><div> rc.metadata.clear()</div><div> dview.results.clear()</div><div> assert not rc.outstanding, "don't clear history when tasks are outstanding"</div>
<div> rc.history = []</div><div> dview.history = []</div></div><div><br></div><div>But given that the memory error occurs after multiple successful trials, it seems like something must be accumulating. Are there other sources of caching that I'm missing? Is anything cached on the engines for instance? I do not store my results between trials, I use cPickle to dump them to files.</div>
<div><br></div><div>-Robert</div><div><br></div><div><br></div><div><br></div><div><br></div><div>The full error from the controller's standard error is included below ----------------------------</div><div><br></div>
<div><div>ERROR:root:Uncaught exception, closing connection.</div><div>Traceback (most recent call last):</div><div> File "/software/linux/x86_64/epd-7.3-1/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 391, in _handle_events</div>
<div> self._handle_recv()</div><div> File "/software/linux/x86_64/epd-7.3-1/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 412, in _handle_recv</div><div> msg = self.socket.recv_multipart(zmq.NOBLOCK, copy=self._recv_copy)</div>
<div> File "socket.pyx", line 723, in zmq.core.socket.Socket.recv_multipart (zmq/core/socket.c:6495)</div><div> File "socket.pyx", line 616, in zmq.core.socket.Socket.recv (zmq/core/socket.c:5961)</div>
<div> File "socket.pyx", line 650, in zmq.core.socket.Socket.recv (zmq/core/socket.c:5832)</div><div> File "socket.pyx", line 120, in zmq.core.socket._recv_copy (zmq/core/socket.c:1681)</div><div> File "message.pyx", line 75, in zmq.core.message.copy_zmq_msg_bytes (zmq/core/message.c:1082)</div>
<div>MemoryError</div><div>ERROR:root:Exception in I/O handler for fd <zmq.core.socket.Socket object at 0x162a6b0></div><div>Traceback (most recent call last):</div><div> File "/software/linux/x86_64/epd-7.3-1/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 330, in start</div>
<div> self._handlers[fd](fd, events)</div><div> File "/software/linux/x86_64/epd-7.3-1/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 391, in _handle_events</div><div> self._handle_recv()</div>
<div> File "/software/linux/x86_64/epd-7.3-1/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 412, in _handle_recv</div><div> msg = self.socket.recv_multipart(zmq.NOBLOCK, copy=self._recv_copy)</div>
<div> File "socket.pyx", line 723, in zmq.core.socket.Socket.recv_multipart (zmq/core/socket.c:6495)</div><div> File "socket.pyx", line 616, in zmq.core.socket.Socket.recv (zmq/core/socket.c:5961)</div>
<div> File "socket.pyx", line 650, in zmq.core.socket.Socket.recv (zmq/core/socket.c:5832)</div><div> File "socket.pyx", line 120, in zmq.core.socket._recv_copy (zmq/core/socket.c:1681)</div><div> File "message.pyx", line 75, in zmq.core.message.copy_zmq_msg_bytes (zmq/core/message.c:1082)</div>
<div>MemoryError</div><div>FATAL ERROR: OUT OF MEMORY (epoll.cpp:57)</div><div>/usr/share/gridengine/hpc/spool/cloudcompute-5/job_scripts/1998: line 14: 31003 Aborted (core dumped) ipcontroller --profile=sge</div>
</div>