[IPython-user] ipython 0.9 ipengine launch question

Brant Peterson brantp@berkeley....
Wed Sep 10 21:19:52 CDT 2008


Brian (et al.),

First off, os, version, etc:
Ubuntu 8.04, iPython 0.9beta3, python 2.5.2

And on to the problem at hand:
I've been trying to get essentially one piece of code to work in the
ipengine/controller framework, and it's conceptually  fairly simple.
It basically reads and writes a few files, then makes a handful of
system calls (both os.system and subprocess.Popen), cleans up a few
files and exits.  This works like a champ if the engines are run via
ipcluster -n 4 (or however many), but if I try any strategy running
ipcontroller and then launching engines, the described error springs
up (including if I do so via ipcluster -f as described above, even
where engines are only on the local machine, or if I just launch a few
engines by invoking ipengine manually, or if i run them via mpiexec
with --mpi=mpi4py)

The Engine Execption is thrown in the terminal in which the script is
run (or in the terminal running ipython, if commands are executed
interactively), while the Fatal Python error shows up in the terminal
buffer in which the ipcontroller and ipengine processes were launched.

I'm not overtly using any C or C++, although it's possible that
os.system and/or subprocess.Popen use C's system call, and at least
subprocess.Popen could plausibly be using threads...

Incidentally, while I originally ran into this with the .execute()
method of the MultiEngineController, I've since re-implemented the
script using the TaskClient framework (which is brilliant!) and see
the same problem.

As far as the code that generates the problem, it's fairly gnarly, but
I'll start trying to package it up for human consumption :)

Thank you again,
Brant

On Wed, Sep 3, 2008 at 1:20 PM, Brian Granger <ellisonbg.net@gmail.com> wrote:
> Brant,
>
> I do want to try to understand what is going on here.
>
>> I'm using ipython 0.9 to run fairly standard commands (.push, .pull,
>> .execute, etc) on ipengine/ipcontroller clusters, and I've been trying
>> to get the -f flag working properly.  My clusterfile.py script looks
>> like:
>>
>> controller = {'host':'kakahiaka',
>>                  }
>>
>> engines = {'kakahiaka':4,
>>
>>               }
>>
>> sshx = '/mnt/py_util/sshx'
>>
>>
>> and sshx reads:
>> #!/bin/sh
>> NUMPROCS=$(cat /proc/cpuinfo | grep -e "processor[[:space:]]:" | wc -l)
>> export NUMPROCS
>> export PATH=$PATH:$HOME/bin:$HOME/multiz:.
>> export PYTHONPATH=/mnt/py_util
>> "$@"
>>
>> In addition, I had to change line 306 of ipcluster.py from
>> cmd = "ssh %s '%s' 'ipengine --controller-ip %s --logfile %s' &" % \
>> (engineHost,sshx,contHost,engLog)
>> to:
>> cmd = "ssh %s '%s' 'ipengine --logfile %s' &" % (engineHost,sshx,engLog)
>> since the new version of ipengine uses foolscap urls instead of
>> straight hostnames.
>
> Yep, this is a bug in ipcluster.py.  I will work on fixing this today.
>
>> Everything seems to come up fine for the -f run mode, and I can run
>> for about 5-10min, before something hiccups, and I get the following
>> errors:
>> in the logfile:
>> 2008/08/27 14:33 -0700 [-] unregistered engine with id: 0
>>
>> in my stderr buffer:
>> [Engine Exception]ConnectionLost: Connection to the other side was
>> lost in a non-clean fashion.
>>
>> and thrown on the local console:
>> Fatal Python error: PyEval_RestoreThread: NULL tstate
>> Aborted
>
> Wow, this is really wierd, especially because we don't really use
> threads.  Hmmm.  Can you give more information?
>
> Are you using threads in any way?
> Can you produce cod that triggers this?
> Which stderr buffer shows that EngineException?
> Python version?
> OS and platform?
>
> Let's see if we can get to the bottom of this...
>
> Brian
>
>
>> The sum of my experience with python threads has been googling for
>> error messages, so my hope is that this is something fairly obvious
>> that I'm just not doing.
>>
>> Any help, or suggestions about what to try would be greatly appreciated!
>> -Brant
>> _______________________________________________
>> IPython-user mailing list
>> IPython-user@scipy.org
>> http://lists.ipython.scipy.org/mailman/listinfo/ipython-user
>>
>


More information about the IPython-user mailing list