[IPython-User] questions about IPython.parallel

Francesco Montesano franz.bergesund@gmail....
Wed Oct 24 05:36:00 CDT 2012


Dear list,

I have a bunch of coded designed to repeat the same operation over a
(possibly large)
number of file. So after discovering Ipython.parallel not long ago, I decided to
rewrite to give me the possibility to use a task scheduler (I use
load_balance_view) in order
to make the best use possible of my quad core machines.
Here is the typical structure of my code

###### BEGIN example.py ######
#imports

def command_line_parsing( ... ):
   "in my case argparse"

def do_some_operation( ... ):
  "executes some mathematical operation"

def read_operate_save_file( file, ... ):
    """reads the file, does operations and save to an output file"""
    input = np.loadtxt( file )
[1] do_some_operation(   )
    np.savetxt( outfile, ..... )

if __name__ == "__main__":

    args = command_line_parsing( )

    #parallelisation can be can chosen or not
    if args.parallel :
	#checks that Ipython is there, that an ipcluster has been started
	#initialises a Client and a load_balance_view. I can pass a string or
	#list of strings to be executed on all engines (I use it to "import xxx as x" )
	lview = IPp.start_load_balanced_view( to_execute )

    if( args.parallel == False ):   #for serial computation
[2]	for fn in args.ifname:  #file name loop
            output = read_operate_save_file(fn, dis, **vars(args) )
	else:   #I want parallel computation
[3]         runs = [ lview.apply( read_operate_save_file,
os.path.abspath(fn.name), ... ) for fn in args.ifname ]
	  results = [r.result for r in runs]

###### END example.py ######

I have two questions:
[1] In function 'read_operate_save_file', I call 'do_some_operation'. When I
work on serial mode, everything works fine, but in parallel mode I get
the error
"IPython.parallel.error.RemoteError: NameError(global name
'do_some_operation' is not defined)"
I'm not surprised by this, as I imagine that each engine know only what has been
executed or defined before and that lview.apply( func, ... ) just passes the
"func" to the engines. A solution that I see is to run "from example import
do_some_operation" on the engines when initialising the load_balance_view. Is
there any easier/safer way?

[2] Because of the way I parse my command line arguments, args.ifname its a
list of already opened files. In serial mode, this is no problem, but when I
assign the function to the scheduler passing the file, I get an error saying
that the cannot work on a closed file. If I pass the file name with the
absolute path, numpy can read it without problem. Is this a behaviour to be
expected or a bug?

Thanks for any help,

Cheers,
Francesco


More information about the IPython-User mailing list