[IPython-User] questions about IPython.parallel

MinRK benjaminrk@gmail....
Wed Oct 24 12:37:56 CDT 2012


On Wed, Oct 24, 2012 at 3:36 AM, Francesco Montesano <
franz.bergesund@gmail.com> wrote:

> Dear list,
>
> I have a bunch of coded designed to repeat the same operation over a
> (possibly large)
> number of file. So after discovering Ipython.parallel not long ago, I
> decided to
> rewrite to give me the possibility to use a task scheduler (I use
> load_balance_view) in order
> to make the best use possible of my quad core machines.
> Here is the typical structure of my code
>
> ###### BEGIN example.py ######
> #imports
>
> def command_line_parsing( ... ):
>    "in my case argparse"
>
> def do_some_operation( ... ):
>   "executes some mathematical operation"
>
> def read_operate_save_file( file, ... ):
>     """reads the file, does operations and save to an output file"""
>     input = np.loadtxt( file )
> [1] do_some_operation(   )
>     np.savetxt( outfile, ..... )
>
> if __name__ == "__main__":
>
>     args = command_line_parsing( )
>
>     #parallelisation can be can chosen or not
>     if args.parallel :
>         #checks that Ipython is there, that an ipcluster has been started
>         #initialises a Client and a load_balance_view. I can pass a string
> or
>         #list of strings to be executed on all engines (I use it to
> "import xxx as x" )
>         lview = IPp.start_load_balanced_view( to_execute )
>
>     if( args.parallel == False ):   #for serial computation
> [2]     for fn in args.ifname:  #file name loop
>             output = read_operate_save_file(fn, dis, **vars(args) )
>         else:   #I want parallel computation
> [3]         runs = [ lview.apply( read_operate_save_file,
> os.path.abspath(fn.name), ... ) for fn in args.ifname ]
>           results = [r.result for r in runs]
>
> ###### END example.py ######
>
> I have two questions:
> [1] In function 'read_operate_save_file', I call 'do_some_operation'. When
> I
> work on serial mode, everything works fine, but in parallel mode I get
> the error
> "IPython.parallel.error.RemoteError: NameError(global name
> 'do_some_operation' is not defined)"
> I'm not surprised by this, as I imagine that each engine know only what
> has been
> executed or defined before and that lview.apply( func, ... ) just passes
> the
> "func" to the engines. A solution that I see is to run "from example import
> do_some_operation" on the engines when initialising the load_balance_view.
> Is
> there any easier/safer way?
>


This namespace issue is common, and I have explanations scattered about the
internet:

http://stackoverflow.com/a/12307741/938949
http://stackoverflow.com/a/10859394/938949
https://github.com/ipython/ipython/issues/2489
http://ipython.org/ipython-doc/dev/parallel/index.html

Which I really need to consolidate into a single thorough explanation with
examples.

But the gist:

- If a function is importable (e.g. in a module available both locally and
remotely), then it's no problem
- If it is defined in __main__ (e.g. in a script), then any references will
be resolved in the *engine* namespace

I recommend conforming to the first case if feasible, because then there
should be no surprises.
Everything surprising happens when you have depend on references in
`__main__` or the current working dir (e.g. locally imported modules),
since `__main__` is not the same on the various machines, nor is the
working dir (necessarily).

That said, if the names you need to resolve are few, a simple import/push
step with a DirectView to set up namespaces should be all you need prior to
submitting tasks (assuming new engines are not arriving in mid-computation).

e.g.:

rc = Client()
dv = rc[:]
# push any locally defined functions that your task function uses:
dv['do_some_operation'] = do_some_operation
# perform any imports that are needed:
dv.execute("import numpy as np...")
# continue as before:
lview = IPp.start_load_balanced_view( to_execute )
...



>
> [2] Because of the way I parse my command line arguments, args.ifname its a
> list of already opened files. In serial mode, this is no problem, but when
> I
> assign the function to the scheduler passing the file, I get an error
> saying
> that the cannot work on a closed file. If I pass the file name with the
> absolute path, numpy can read it without problem. Is this a behaviour to be
> expected or a bug?
>

I would expect a PickleError when you try to send an open file.  Definitely
send filenames, not open file objects.


>
> Thanks for any help,
>
> Cheers,
> Francesco
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20121024/c5291b0c/attachment.html 


More information about the IPython-User mailing list