[IPython-user] [0:execute]: IOError: [Errno 4] Interrupted system call

mark starnes m.starnes05@imperial.ac...
Tue Dec 2 09:23:49 CST 2008


Hi everyone.

I removed Pypar; no effect.  The USB control isn't in use AFAIK.
I'm not using PBS (though, I hoped to at a later point).  I am
accessing this program on a remote machine, using

$ ssh -X machinename

with one ssh for the ipcluster command, and one for the ipython
console.  The remote machine is suse10.3, 64 bit and is running
a vncserver (if this is of any relevance).

--------------------------------------
Anyway, I'm finding now, that if I run from IPython command
line, I can't get the error to occur.  On closing IPython, however,
I get:

In [4@16:28:54]:
Do you really want to exit ([y]/n)? y
Closing threads... Done.
Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method RemoteReferenceTracker._refLost of <RemoteReferenceTracker(clid=1,url=pbu://127.0.0.1:21778/uwtv36uev6e7emd45xfz77tt75krrduj)>> ignored

implying, perhaps that it is occurring but not being reported.

--------------------------------------
If I run from the prompt, via

$ fe2.py fefile.pyFE

I get an error, every time, on engine 0.  The code executed, I believe,
when calling in this way, is the same as that I type in at the console,
it's just that I've wrapped it in a

if __name__ == '__main__'

section.  The error I'm getting is slightly different from the old one,
but it occurs at exactly the same line numbers as before.  The error is
now,

CompositeError: one or more exceptions from call to method: execute
[0:execute]: AttributeError: 'module' object has no attribute 'Solver'

'Solver' is the class I'm pushing to all nodes.  Inspection of the
contents at the nodes shows the push wash not successful to any (expected,
given engine 0 appeared to fail).

Could it be that I'm using the wrong version of twisted, or some other
package?

Quiting the IPython console after this process, returns a similar result
to that returned when running purely from the console:

Exception exceptions.TypeError: "'NoneType' object is not callable" in <bound method RemoteReferenceTracker._refLost of <RemoteReferenceTracker(clid=1,url=pbu://127.0.0.1:21778/uwtv36uev6e7emd45xfz77tt75krrduj)>> ignored



Thanks for your time on this.  I'm really stumped.  :(

BR,

Mark.




Brian Granger wrote:
> Just checked in Twisted and IPython and there are no os.kill's
> anywhere in the core that would show up here.  That leaves:
> 
> 1.  User code
> 2.  Some other process.  Is there a chance that the engines are being
> started using some batch system (like PBS) that ends up sending the
> process signals?
> 
> Brian
> 
> 
> 
> On Mon, Dec 1, 2008 at 2:21 PM, Andrew Straw <strawman@astraw.com> wrote:
>> Brian Granger wrote:
>>>> Sorry I don't have any specific information concerning your situation,
>>>> but as it's been a few days with no response, I figured I'd chime in. My
>>>> understanding is that an interrupted system call (EINTR) happens when a
>>>> system call (e.g. select(), fread(), fwrite(), and so on) is interrupted
>>>> by a signal that the kernel decides your thread is going to handle. The
>>>> correct behavior is to deal appropriately with the signal (possibly
>>>> ignoring it) and then repeat your system call.
>>>>
>>> Yep, I think that is what is going on.  Is there a chance that the
>>> signal is anything other than EINTR?  I ask as that will help us track
>>> this down.
>>>
>> Sorry, I wasn't clear here. EINTR is the errno value that's returned
>> here. The signal could be anything.
>>>> I've found lots of code in the wild that is not robust to being
>>>> interrupted this way (including in core Python), but luckily very little
>>>> code that sends signals and thus interrupts code that way. So, I think
>>>> the best solution will be to find what is sending the signal and
>>>> eliminate it. Hopefully this will ring some bells with someone more
>>>> knowledgeable in the ipython internals than I. Also, are you running any
>>>> third party code that could be sending signals?
>>>>
>>> There are a couple of possibilities:
>>>
>>> 1.  Something deep in the internals of Python itself.
>>> 2.  Something deep in Twisted
>>> 3.  It wouldn't be in IPython as we (as far as I know) are not sending
>>> any signals.
>>> 4.  Deep somewhere in user code that they are not aware of.
>>>
>>> My best guesses are Twisted or in user code.  I will look at Twisted
>>> to see if it sends signals anywhere.  Is it also possible that the
>>> kernel itself sends the signal?
>>>
>> I've never experienced the kernel sending unrequested signals other than
>> the usual things like SIGINT and SIGKILL... But I suppose it could be
>> configured to do so somehow.
>>
>> My best guess is that this is more likely to be in user code than in
>> Twisted or the other sources you list. I suspect that if signals were
>> used as part of Python or Twisted, folks attempting to  use
>> non-EINTR-safe code (there's a lot of it) would've come screaming by
>> now. My relatively naive understanding of this stuff is that signals are
>> the blunt tool of inter-process communication and I see the Twisted crew
>> as more the fine surgeon types... But I haven't grepped for os.kill() in
>> the Twisted sources, either, so I could be wrong. Likewise I don't know
>> about IPython, but I'd be surprised if it was doing IPC via signals. If
>> it is, I'd suggest it be removed, as it will come back to haunt anyone
>> without non-EINTR-safe code.
>>
>> It's debugging these types of things that turns my hair gray...
>>
>> -Andrew
>>
> 


More information about the IPython-user mailing list