[IPython-User] IPython-User Digest, Vol 108, Issue 29

Ming Li lijianting84@gmail....
Thu Oct 25 02:52:35 CDT 2012


2012/10/25 <ipython-user-request@scipy.org>

> Send IPython-User mailing list submissions to
>         ipython-user@scipy.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mail.scipy.org/mailman/listinfo/ipython-user
> or, via email, send a message with subject or body 'help' to
>         ipython-user-request@scipy.org
>
> You can reach the person managing the list at
>         ipython-user-owner@scipy.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of IPython-User digest..."
>
>
> Today's Topics:
>
>    1. Re: questions about IPython.parallel (MinRK)
>    2. Re: questions about IPython.parallel (Francesco Montesano)
>    3. Re: questions about IPython.parallel (Min RK)
>    4. error of install ipython 0.10.0 (Jack Bryan)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 24 Oct 2012 10:37:56 -0700
> From: MinRK <benjaminrk@gmail.com>
> Subject: Re: [IPython-User] questions about IPython.parallel
> To: "Discussions about using IPython. http://ipython.org"
>         <ipython-user@scipy.org>
> Message-ID:
>         <
> CAHNn8BXZT4ZuyTEyDR1Jd9Tho9skOdhtNVjTaLmkEuxkzFNgUw@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Wed, Oct 24, 2012 at 3:36 AM, Francesco Montesano <
> franz.bergesund@gmail.com> wrote:
>
> > Dear list,
> >
> > I have a bunch of coded designed to repeat the same operation over a
> > (possibly large)
> > number of file. So after discovering Ipython.parallel not long ago, I
> > decided to
> > rewrite to give me the possibility to use a task scheduler (I use
> > load_balance_view) in order
> > to make the best use possible of my quad core machines.
> > Here is the typical structure of my code
> >
> > ###### BEGIN example.py ######
> > #imports
> >
> > def command_line_parsing( ... ):
> >    "in my case argparse"
> >
> > def do_some_operation( ... ):
> >   "executes some mathematical operation"
> >
> > def read_operate_save_file( file, ... ):
> >     """reads the file, does operations and save to an output file"""
> >     input = np.loadtxt( file )
> > [1] do_some_operation(   )
> >     np.savetxt( outfile, ..... )
> >
> > if __name__ == "__main__":
> >
> >     args = command_line_parsing( )
> >
> >     #parallelisation can be can chosen or not
> >     if args.parallel :
> >         #checks that Ipython is there, that an ipcluster has been started
> >         #initialises a Client and a load_balance_view. I can pass a
> string
> > or
> >         #list of strings to be executed on all engines (I use it to
> > "import xxx as x" )
> >         lview = IPp.start_load_balanced_view( to_execute )
> >
> >     if( args.parallel == False ):   #for serial computation
> > [2]     for fn in args.ifname:  #file name loop
> >             output = read_operate_save_file(fn, dis, **vars(args) )
> >         else:   #I want parallel computation
> > [3]         runs = [ lview.apply( read_operate_save_file,
> > os.path.abspath(fn.name), ... ) for fn in args.ifname ]
> >           results = [r.result for r in runs]
> >
> > ###### END example.py ######
> >
> > I have two questions:
> > [1] In function 'read_operate_save_file', I call 'do_some_operation'.
> When
> > I
> > work on serial mode, everything works fine, but in parallel mode I get
> > the error
> > "IPython.parallel.error.RemoteError: NameError(global name
> > 'do_some_operation' is not defined)"
> > I'm not surprised by this, as I imagine that each engine know only what
> > has been
> > executed or defined before and that lview.apply( func, ... ) just passes
> > the
> > "func" to the engines. A solution that I see is to run "from example
> import
> > do_some_operation" on the engines when initialising the
> load_balance_view.
> > Is
> > there any easier/safer way?
> >
>
>
> This namespace issue is common, and I have explanations scattered about the
> internet:
>
> http://stackoverflow.com/a/12307741/938949
> http://stackoverflow.com/a/10859394/938949
> https://github.com/ipython/ipython/issues/2489
> http://ipython.org/ipython-doc/dev/parallel/index.html
>
> Which I really need to consolidate into a single thorough explanation with
> examples.
>
> But the gist:
>
> - If a function is importable (e.g. in a module available both locally and
> remotely), then it's no problem
> - If it is defined in __main__ (e.g. in a script), then any references will
> be resolved in the *engine* namespace
>
> I recommend conforming to the first case if feasible, because then there
> should be no surprises.
> Everything surprising happens when you have depend on references in
> `__main__` or the current working dir (e.g. locally imported modules),
> since `__main__` is not the same on the various machines, nor is the
> working dir (necessarily).
>
> That said, if the names you need to resolve are few, a simple import/push
> step with a DirectView to set up namespaces should be all you need prior to
> submitting tasks (assuming new engines are not arriving in
> mid-computation).
>
> e.g.:
>
> rc = Client()
> dv = rc[:]
> # push any locally defined functions that your task function uses:
> dv['do_some_operation'] = do_some_operation
> # perform any imports that are needed:
> dv.execute("import numpy as np...")
> # continue as before:
> lview = IPp.start_load_balanced_view( to_execute )
> ...
>
>
>
> >
> > [2] Because of the way I parse my command line arguments, args.ifname
> its a
> > list of already opened files. In serial mode, this is no problem, but
> when
> > I
> > assign the function to the scheduler passing the file, I get an error
> > saying
> > that the cannot work on a closed file. If I pass the file name with the
> > absolute path, numpy can read it without problem. Is this a behaviour to
> be
> > expected or a bug?
> >
>
> I would expect a PickleError when you try to send an open file.  Definitely
> send filenames, not open file objects.
>
>
> >
> > Thanks for any help,
> >
> > Cheers,
> > Francesco
> > _______________________________________________
> > IPython-User mailing list
> > IPython-User@scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-user
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mail.scipy.org/pipermail/ipython-user/attachments/20121024/c5291b0c/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Wed, 24 Oct 2012 22:07:37 +0200
> From: Francesco Montesano <franz.bergesund@gmail.com>
> Subject: Re: [IPython-User] questions about IPython.parallel
> To: "Discussions about using IPython. http://ipython.org"
>         <ipython-user@scipy.org>
> Message-ID:
>         <
> CAOCdBK+ypeqyWzzXFqC27Yqr56gahXQgTMQUw4rMyh0OLyvG5A@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi Min,
>
> thanks for the answer
>
> 2012/10/24 MinRK <benjaminrk@gmail.com>:
> >
> >
> > On Wed, Oct 24, 2012 at 3:36 AM, Francesco Montesano
> > <franz.bergesund@gmail.com> wrote:
> >>
> >> Dear list,
> >>
> >> I have a bunch of coded designed to repeat the same operation over a
> >> (possibly large)
> >> number of file. So after discovering Ipython.parallel not long ago, I
> >> decided to
> >> rewrite to give me the possibility to use a task scheduler (I use
> >> load_balance_view) in order
> >> to make the best use possible of my quad core machines.
> >> Here is the typical structure of my code
> >>
> >> ###### BEGIN example.py ######
> >> #imports
> >>
> >> def command_line_parsing( ... ):
> >>    "in my case argparse"
> >>
> >> def do_some_operation( ... ):
> >>   "executes some mathematical operation"
> >>
> >> def read_operate_save_file( file, ... ):
> >>     """reads the file, does operations and save to an output file"""
> >>     input = np.loadtxt( file )
> >> [1] do_some_operation(   )
> >>     np.savetxt( outfile, ..... )
> >>
> >> if __name__ == "__main__":
> >>
> >>     args = command_line_parsing( )
> >>
> >>     #parallelisation can be can chosen or not
> >>     if args.parallel :
> >>         #checks that Ipython is there, that an ipcluster has been
> started
> >>         #initialises a Client and a load_balance_view. I can pass a
> string
> >> or
> >>         #list of strings to be executed on all engines (I use it to
> >> "import xxx as x" )
> >>         lview = IPp.start_load_balanced_view( to_execute )
> >>
> >>     if( args.parallel == False ):   #for serial computation
> >> [2]     for fn in args.ifname:  #file name loop
> >>             output = read_operate_save_file(fn, dis, **vars(args) )
> >>         else:   #I want parallel computation
> >> [3]         runs = [ lview.apply( read_operate_save_file,
> >> os.path.abspath(fn.name), ... ) for fn in args.ifname ]
> >>           results = [r.result for r in runs]
> >>
> >> ###### END example.py ######
> >>
> >> I have two questions:
> >> [1] In function 'read_operate_save_file', I call 'do_some_operation'.
> When
> >> I
> >> work on serial mode, everything works fine, but in parallel mode I get
> >> the error
> >> "IPython.parallel.error.RemoteError: NameError(global name
> >> 'do_some_operation' is not defined)"
> >> I'm not surprised by this, as I imagine that each engine know only what
> >> has been
> >> executed or defined before and that lview.apply( func, ... ) just passes
> >> the
> >> "func" to the engines. A solution that I see is to run "from example
> >> import
> >> do_some_operation" on the engines when initialising the
> load_balance_view.
> >> Is
> >> there any easier/safer way?
> >
> >
> >
> > This namespace issue is common, and I have explanations scattered about
> the
> > internet:
> >
> > http://stackoverflow.com/a/12307741/938949
> > http://stackoverflow.com/a/10859394/938949
> > https://github.com/ipython/ipython/issues/2489
> > http://ipython.org/ipython-doc/dev/parallel/index.html
> >
> > Which I really need to consolidate into a single thorough explanation
> with
> > examples.
> >
> > But the gist:
> >
> > - If a function is importable (e.g. in a module available both locally
> and
> > remotely), then it's no problem
> > - If it is defined in __main__ (e.g. in a script), then any references
> will
> > be resolved in the *engine* namespace
> >
> > I recommend conforming to the first case if feasible, because then there
> > should be no surprises.
> > Everything surprising happens when you have depend on references in
> > `__main__` or the current working dir (e.g. locally imported modules),
> since
> > `__main__` is not the same on the various machines, nor is the working
> dir
> > (necessarily).
> >
> > That said, if the names you need to resolve are few, a simple import/push
> > step with a DirectView to set up namespaces should be all you need prior
> to
> > submitting tasks (assuming new engines are not arriving in
> mid-computation).
> >
> > e.g.:
> >
> > rc = Client()
> > dv = rc[:]
> > # push any locally defined functions that your task function uses:
> > dv['do_some_operation'] = do_some_operation
> I ended up doing the following when initialising the load_balance_view
> dv.execute( 'import sys' )
> dv.execute( 'sys.path.append("path_to_example.py")' )
> dv.execute( 'from example import do_some_operation' )
> Your suggestion looks much neater, just a couple of questions.
> With the push that you suggest, do I simply call the
> 'do_some_operation' as in my example or do I need some different
> syntax?
> Do you think that one or the other way is more optimal when the
> function is called and executed?
>
> > # perform any imports that are needed:
> > dv.execute("import numpy as np...")
> > # continue as before:
> > lview = IPp.start_load_balanced_view( to_execute )
> > ...
> >
> >
> >>
> >>
> >> [2] Because of the way I parse my command line arguments, args.ifname
> its
> >> a
> >> list of already opened files. In serial mode, this is no problem, but
> when
> >> I
> >> assign the function to the scheduler passing the file, I get an error
> >> saying
> >> that the cannot work on a closed file. If I pass the file name with the
> >> absolute path, numpy can read it without problem. Is this a behaviour to
> >> be
> >> expected or a bug?
> >
> >
> > I would expect a PickleError when you try to send an open file.
>  Definitely
> > send filenames, not open file objects.
> Just a curiosity: what is the working directory of the engines? Is the
> one where the ipcluster is started or where the profile is stored?
> (While fixing my code, I ended up passing the filename with the full path)
>
> Thanks again,
>
> Francesco
>
> >
> >>
> >>
> >> Thanks for any help,
> >>
> >> Cheers,
> >> Francesco
> >> _______________________________________________
> >> IPython-User mailing list
> >> IPython-User@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/ipython-user
> >
> >
> >
> > _______________________________________________
> > IPython-User mailing list
> > IPython-User@scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-user
> >
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 24 Oct 2012 13:51:22 -0700
> From: Min RK <benjaminrk@gmail.com>
> Subject: Re: [IPython-User] questions about IPython.parallel
> To: "Discussions about using IPython. http://ipython.org"
>         <ipython-user@scipy.org>
> Message-ID: <CCF626CE-650C-4029-A7FA-B7DCDA8A6F30@gmail.com>
> Content-Type: text/plain;       charset=us-ascii
>
>
>
> On Oct 24, 2012, at 13:07, Francesco Montesano <franz.bergesund@gmail.com>
> wrote:
>
> > Hi Min,
> >
> > thanks for the answer
> >
> > 2012/10/24 MinRK <benjaminrk@gmail.com>:
> >>
> >>
> >> On Wed, Oct 24, 2012 at 3:36 AM, Francesco Montesano
> >> <franz.bergesund@gmail.com> wrote:
> >>>
> >>> Dear list,
> >>>
> >>> I have a bunch of coded designed to repeat the same operation over a
> >>> (possibly large)
> >>> number of file. So after discovering Ipython.parallel not long ago, I
> >>> decided to
> >>> rewrite to give me the possibility to use a task scheduler (I use
> >>> load_balance_view) in order
> >>> to make the best use possible of my quad core machines.
> >>> Here is the typical structure of my code
> >>>
> >>> ###### BEGIN example.py ######
> >>> #imports
> >>>
> >>> def command_line_parsing( ... ):
> >>>   "in my case argparse"
> >>>
> >>> def do_some_operation( ... ):
> >>>  "executes some mathematical operation"
> >>>
> >>> def read_operate_save_file( file, ... ):
> >>>    """reads the file, does operations and save to an output file"""
> >>>    input = np.loadtxt( file )
> >>> [1] do_some_operation(   )
> >>>    np.savetxt( outfile, ..... )
> >>>
> >>> if __name__ == "__main__":
> >>>
> >>>    args = command_line_parsing( )
> >>>
> >>>    #parallelisation can be can chosen or not
> >>>    if args.parallel :
> >>>        #checks that Ipython is there, that an ipcluster has been
> started
> >>>        #initialises a Client and a load_balance_view. I can pass a
> string
> >>> or
> >>>        #list of strings to be executed on all engines (I use it to
> >>> "import xxx as x" )
> >>>        lview = IPp.start_load_balanced_view( to_execute )
> >>>
> >>>    if( args.parallel == False ):   #for serial computation
> >>> [2]     for fn in args.ifname:  #file name loop
> >>>            output = read_operate_save_file(fn, dis, **vars(args) )
> >>>        else:   #I want parallel computation
> >>> [3]         runs = [ lview.apply( read_operate_save_file,
> >>> os.path.abspath(fn.name), ... ) for fn in args.ifname ]
> >>>          results = [r.result for r in runs]
> >>>
> >>> ###### END example.py ######
> >>>
> >>> I have two questions:
> >>> [1] In function 'read_operate_save_file', I call 'do_some_operation'.
> When
> >>> I
> >>> work on serial mode, everything works fine, but in parallel mode I get
> >>> the error
> >>> "IPython.parallel.error.RemoteError: NameError(global name
> >>> 'do_some_operation' is not defined)"
> >>> I'm not surprised by this, as I imagine that each engine know only what
> >>> has been
> >>> executed or defined before and that lview.apply( func, ... ) just
> passes
> >>> the
> >>> "func" to the engines. A solution that I see is to run "from example
> >>> import
> >>> do_some_operation" on the engines when initialising the
> load_balance_view.
> >>> Is
> >>> there any easier/safer way?
> >>
> >>
> >>
> >> This namespace issue is common, and I have explanations scattered about
> the
> >> internet:
> >>
> >> http://stackoverflow.com/a/12307741/938949
> >> http://stackoverflow.com/a/10859394/938949
> >> https://github.com/ipython/ipython/issues/2489
> >> http://ipython.org/ipython-doc/dev/parallel/index.html
> >>
> >> Which I really need to consolidate into a single thorough explanation
> with
> >> examples.
> >>
> >> But the gist:
> >>
> >> - If a function is importable (e.g. in a module available both locally
> and
> >> remotely), then it's no problem
> >> - If it is defined in __main__ (e.g. in a script), then any references
> will
> >> be resolved in the *engine* namespace
> >>
> >> I recommend conforming to the first case if feasible, because then there
> >> should be no surprises.
> >> Everything surprising happens when you have depend on references in
> >> `__main__` or the current working dir (e.g. locally imported modules),
> since
> >> `__main__` is not the same on the various machines, nor is the working
> dir
> >> (necessarily).
> >>
> >> That said, if the names you need to resolve are few, a simple
> import/push
> >> step with a DirectView to set up namespaces should be all you need
> prior to
> >> submitting tasks (assuming new engines are not arriving in
> mid-computation).
> >>
> >> e.g.:
> >>
> >> rc = Client()
> >> dv = rc[:]
> >> # push any locally defined functions that your task function uses:
> >> dv['do_some_operation'] = do_some_operation
> > I ended up doing the following when initialising the load_balance_view
> > dv.execute( 'import sys' )
> > dv.execute( 'sys.path.append("path_to_example.py")' )
> > dv.execute( 'from example import do_some_operation' )
> > Your suggestion looks much neater, just a couple of questions.
> > With the push that you suggest, do I simply call the
> > 'do_some_operation' as in my example or do I need some different
> > syntax?
> > Do you think that one or the other way is more optimal when the
> > function is called and executed?
> >
> >> # perform any imports that are needed:
> >> dv.execute("import numpy as np...")
> >> # continue as before:
> >> lview = IPp.start_load_balanced_view( to_execute )
> >> ...
> >>
> >>
> >>>
> >>>
> >>> [2] Because of the way I parse my command line arguments, args.ifname
> its
> >>> a
> >>> list of already opened files. In serial mode, this is no problem, but
> when
> >>> I
> >>> assign the function to the scheduler passing the file, I get an error
> >>> saying
> >>> that the cannot work on a closed file. If I pass the file name with the
> >>> absolute path, numpy can read it without problem. Is this a behaviour
> to
> >>> be
> >>> expected or a bug?
> >>
> >>
> >> I would expect a PickleError when you try to send an open file.
>  Definitely
> >> send filenames, not open file objects.
> > Just a curiosity: what is the working directory of the engines? Is the
> > one where the ipcluster is started or where the profile is stored?
> > (While fixing my code, I ended up passing the filename with the full
> path)
>
> It depends on configuration and how you start the engines.  You can set
> this with your config files (look for work_dir), and you can view the
> current working dir with:
>
> rc[:].apply_sync(os.getcwdu)
>
> >
> > Thanks again,
> >
> > Francesco
> >
> >>
> >>>
> >>>
> >>> Thanks for any help,
> >>>
> >>> Cheers,
> >>> Francesco
> >>> _______________________________________________
> >>> IPython-User mailing list
> >>> IPython-User@scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/ipython-user
> >>
> >>
> >>
> >> _______________________________________________
> >> IPython-User mailing list
> >> IPython-User@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/ipython-user
> > _______________________________________________
> > IPython-User mailing list
> > IPython-User@scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 24 Oct 2012 19:52:01 -0600
> From: Jack Bryan <dtustudy68@hotmail.com>
> Subject: [IPython-User] error of install ipython 0.10.0
> To: <ipython-user@scipy.org>
> Message-ID: <BLU154-W3ED81D8D80D5A75D5A862CB7F0@phx.gbl>
> Content-Type: text/plain; charset="iso-8859-1"
>
>
> Hi,
> I am trying to install ipython 0.10.0  on Linux Red Hat 4.4.4-17.
> I use instructions on:
> $ tar -xzf ipython.tar.gz
> $ cd ipython
> $ python setup.py installhttp://
> ipython.org/ipython-doc/stable/install/install.html
> but, I got:
> import pkg_resources        Zope.Interface: yes               Twisted: Not
> found (required for parallel computing                        capabilities)
>              Foolscap: Not found (required for parallel computing
>              capabilities)               OpenSSL: 0.10
>  sphinx: Not found (required for building documentation)
>  pygments: Not found (required for syntax highlighting
>    documentation)                  nose: Not found (required for running
> the test suite)               pexpect: no (required for running standalone
> doctests)running installrunning buildrunning build_pyrunning
> build_scriptsrunning install_librunning install_scriptschanging mode of
> /home/user/usr/bin/ipengine to 755changing mode of
> /home/user/usr/bin/ipcontroller to 755changing mode of
> /home/user/usr/bin/ipcluster to 755changing mode of
> /home/user/usr/bin/ipython to 755changing mode of
> /home/user/usr/bin/ipythonx to 755changing mode of /home/user
>  /usr/bin/ipython-wx to 755changing mode of /home/user/usr/bin/pycolor to
> 755changing mode of /home/user/usr/bin/irunner to 755changing mode of
> /home/user/usr/bin/iptest to 755running install_datarunning
> install_egg_infoRemoving
> /home/user/usr/lib/python2.6/site-packages/ipython-0.10.1-py2.6.egg-infoWriting
> /home/user/usr/lib/python2.6/site-packages/ipython-0.10.1-py2.6.egg-info
> I have used easy_install  to install nose, sphinx, pexpect, pygments.
> Why do I still have this error ?
> Thanks
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mail.scipy.org/pipermail/ipython-user/attachments/20121024/82c1e0e8/attachment.html
>
> ------------------------------
>
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
> End of IPython-User Digest, Vol 108, Issue 29
> *********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20121025/7e833070/attachment-0001.html 


More information about the IPython-User mailing list