[IPython-User] questions about IPython.parallel
Wed Oct 24 05:36:00 CDT 2012
I have a bunch of coded designed to repeat the same operation over a
number of file. So after discovering Ipython.parallel not long ago, I decided to
rewrite to give me the possibility to use a task scheduler (I use
load_balance_view) in order
to make the best use possible of my quad core machines.
Here is the typical structure of my code
###### BEGIN example.py ######
def command_line_parsing( ... ):
"in my case argparse"
def do_some_operation( ... ):
"executes some mathematical operation"
def read_operate_save_file( file, ... ):
"""reads the file, does operations and save to an output file"""
input = np.loadtxt( file )
 do_some_operation( )
np.savetxt( outfile, ..... )
if __name__ == "__main__":
args = command_line_parsing( )
#parallelisation can be can chosen or not
if args.parallel :
#checks that Ipython is there, that an ipcluster has been started
#initialises a Client and a load_balance_view. I can pass a string or
#list of strings to be executed on all engines (I use it to "import xxx as x" )
lview = IPp.start_load_balanced_view( to_execute )
if( args.parallel == False ): #for serial computation
 for fn in args.ifname: #file name loop
output = read_operate_save_file(fn, dis, **vars(args) )
else: #I want parallel computation
 runs = [ lview.apply( read_operate_save_file,
os.path.abspath(fn.name), ... ) for fn in args.ifname ]
results = [r.result for r in runs]
###### END example.py ######
I have two questions:
 In function 'read_operate_save_file', I call 'do_some_operation'. When I
work on serial mode, everything works fine, but in parallel mode I get
"IPython.parallel.error.RemoteError: NameError(global name
'do_some_operation' is not defined)"
I'm not surprised by this, as I imagine that each engine know only what has been
executed or defined before and that lview.apply( func, ... ) just passes the
"func" to the engines. A solution that I see is to run "from example import
do_some_operation" on the engines when initialising the load_balance_view. Is
there any easier/safer way?
 Because of the way I parse my command line arguments, args.ifname its a
list of already opened files. In serial mode, this is no problem, but when I
assign the function to the scheduler passing the file, I get an error saying
that the cannot work on a closed file. If I pass the file name with the
absolute path, numpy can read it without problem. Is this a behaviour to be
expected or a bug?
Thanks for any help,
More information about the IPython-User