[IPython-User] questions about IPython.parallel
Wed Oct 24 12:37:56 CDT 2012
On Wed, Oct 24, 2012 at 3:36 AM, Francesco Montesano <
> Dear list,
> I have a bunch of coded designed to repeat the same operation over a
> (possibly large)
> number of file. So after discovering Ipython.parallel not long ago, I
> decided to
> rewrite to give me the possibility to use a task scheduler (I use
> load_balance_view) in order
> to make the best use possible of my quad core machines.
> Here is the typical structure of my code
> ###### BEGIN example.py ######
> def command_line_parsing( ... ):
> "in my case argparse"
> def do_some_operation( ... ):
> "executes some mathematical operation"
> def read_operate_save_file( file, ... ):
> """reads the file, does operations and save to an output file"""
> input = np.loadtxt( file )
>  do_some_operation( )
> np.savetxt( outfile, ..... )
> if __name__ == "__main__":
> args = command_line_parsing( )
> #parallelisation can be can chosen or not
> if args.parallel :
> #checks that Ipython is there, that an ipcluster has been started
> #initialises a Client and a load_balance_view. I can pass a string
> #list of strings to be executed on all engines (I use it to
> "import xxx as x" )
> lview = IPp.start_load_balanced_view( to_execute )
> if( args.parallel == False ): #for serial computation
>  for fn in args.ifname: #file name loop
> output = read_operate_save_file(fn, dis, **vars(args) )
> else: #I want parallel computation
>  runs = [ lview.apply( read_operate_save_file,
> os.path.abspath(fn.name), ... ) for fn in args.ifname ]
> results = [r.result for r in runs]
> ###### END example.py ######
> I have two questions:
>  In function 'read_operate_save_file', I call 'do_some_operation'. When
> work on serial mode, everything works fine, but in parallel mode I get
> the error
> "IPython.parallel.error.RemoteError: NameError(global name
> 'do_some_operation' is not defined)"
> I'm not surprised by this, as I imagine that each engine know only what
> has been
> executed or defined before and that lview.apply( func, ... ) just passes
> "func" to the engines. A solution that I see is to run "from example import
> do_some_operation" on the engines when initialising the load_balance_view.
> there any easier/safer way?
This namespace issue is common, and I have explanations scattered about the
Which I really need to consolidate into a single thorough explanation with
But the gist:
- If a function is importable (e.g. in a module available both locally and
remotely), then it's no problem
- If it is defined in __main__ (e.g. in a script), then any references will
be resolved in the *engine* namespace
I recommend conforming to the first case if feasible, because then there
should be no surprises.
Everything surprising happens when you have depend on references in
`__main__` or the current working dir (e.g. locally imported modules),
since `__main__` is not the same on the various machines, nor is the
working dir (necessarily).
That said, if the names you need to resolve are few, a simple import/push
step with a DirectView to set up namespaces should be all you need prior to
submitting tasks (assuming new engines are not arriving in mid-computation).
rc = Client()
dv = rc[:]
# push any locally defined functions that your task function uses:
dv['do_some_operation'] = do_some_operation
# perform any imports that are needed:
dv.execute("import numpy as np...")
# continue as before:
lview = IPp.start_load_balanced_view( to_execute )
>  Because of the way I parse my command line arguments, args.ifname its a
> list of already opened files. In serial mode, this is no problem, but when
> assign the function to the scheduler passing the file, I get an error
> that the cannot work on a closed file. If I pass the file name with the
> absolute path, numpy can read it without problem. Is this a behaviour to be
> expected or a bug?
I would expect a PickleError when you try to send an open file. Definitely
send filenames, not open file objects.
> Thanks for any help,
> IPython-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User