[SciPy-dev] cow: 'Connection reset by peer' timeout problem?
simon.saubern at molsci.csiro.au
Wed Jan 29 19:34:11 CST 2003
I'm re-posting this message here as I didn't get any replies on the
I've been using cow to try out some distributed calculations.
Everything works fine if I use a subset of my data, but when I use
the full set I get "error: (10054, 'Connection reset by peer')"
messages on the master unit (see below for full output).
I can operate on larger and larger subsets until I get to the point
where if the slaves take more than about 4 minutes to complete a
task, the above error appears at the master.
That is, connections are established (confirmed using netstat),
processing occurs on the slaves and keeps going, but the master times
out after about 4min.
Is this a 'keep alive' problem? If so, how can I extend the time out period?
10 x slave + master, all Win2K SP-3
latest scipy binary for Win
cowname['data']=data # a list 35000 long.
lendata=range(7000) # just use a subset
while not bessy:
bessy gets processed here
'data' is quite large and takes a while to transfer over the network.
But by doing it once and looping over the index, I minimize network
movements. The 'python' process on each slave uses about 85MB.
Increasing 'lendata' eventually causes the 'Connection reset by peer'
message to appear.
Any pointers welcomed.
line 823, in loop_code
line 847, in loop_send_recv
results = self._send_recv(package,addendums)
line 345, in _send_recv
self.last_results = self._recv()
line 303, in _recv
line 404, in recv
package = self.channel.read()
line 164, in read
x = self.rfile.read()
File "c:\Program Files\Python22\lib\socket.py", line 228, in read
new = self._sock.recv(k)
error: (10054, 'Connection reset by peer')
More information about the Scipy-dev