[IPython-user] Cannot start ipcluster

Gökhan Sever gokhansever@gmail....
Sun Oct 18 01:20:54 CDT 2009


On Sun, Oct 18, 2009 at 12:32 AM, Kenneth Arnold
<kenneth.arnold@gmail.com>wrote:

> ipython's distributed computing facilities are powerful, but can also
> be confusing. You may find the builtin multiprocessing module easier
> to get started with (see
> http://docs.python.org/dev/library/multiprocessing.html#examples --
> the one entitled "An [sic] showing how to use queues to feed tasks to
> a collection of worker process"). This is a simple enough use case
> that you should hopefully be able to avoid the bugs in the
> multiprocessing module ;)
>
> This reminds me that I had some questions and issues from back when I
> was trying to use TaskClient -- I'll bring that up sometime. (In
> short, good real-world examples, like how to deal with nontrivial code
> and data, would be super-excellent.)
>
> -Ken
>
>
>
Thanks for the suggestion. multiprocessing module was also in my mind to
experiment with alternative to IPython's approach. Actually, my original
intention is to rebase the whole original code-base to Python. Freeing the
source code from IDL license restriction, unifying under one umbrella
(instead of using multi languages --IDL, C, Perl, Bash, Csh etc... More
importantly much lowering the amount of code. The core processing code
totals probably more than 25-30k code in the repo. I am assuming this would
bring down to 5-6k levels and employing some parallel processing techniques
and object orientalism. However this is not an easy take, and I need papers
to read and posters to prepare :)

If you are curious about the project please take a look at its SourceForge
entry at: http://sourceforge.net/projects/adpaa/

and this is for source code analysis:
https://www.ohloh.net/p/adpaa/analyses/latest

Besides most of the processing code is written in IDL, also there is a GUI
written using IDL. For some reason my local IDL license doesn't work with
Fedora 11 and my Cisco VPN client kills my regular internet access. I can
access to license server this way, but then lose my net access :)




>
> On Sun, Oct 18, 2009 at 1:15 AM, Gökhan Sever <gokhansever@gmail.com>
> wrote:
> >
> >
> > On Sat, Oct 17, 2009 at 11:56 PM, Brian Granger <ellisonbg.net@gmail.com
> >
> > wrote:
> >>
> >> Does each .sea file take the same amount of time?
> >
> > No, depends on the file size and content, each processing takes different
> > amount of time.
> >
> >>
> >> How many .sea files are there in total?
> >
> > 17 folders so in total 17 sea files. Actually the example dataset I am
> using
> > here is just a small subset of the original dataset. There may be more
> than
> > couple hundred folders lying inside the main archive. For now my main
> > intention is only to parallelize (why my gmail shows this word with a red
> > underlined fashion, google itself suggest this as right but gmail warns
> :))
> > the small subset.
> >
> >>
> >> How long for each .sea file?
> >
> > 3 to 5 minutes, again depends on the file size and content.
> >
> >>
> >> What is the result?  A new file?
> >
> > Bunch of new files (50 to 100). For instance, a 60 MB sea file produces
> > ~430MB data when it is processed mostly ASCII but there are Binary files
> > outputted as well. Out of 430 MB there are also down-sampled data created
> > using 25 Hz data down to 1 Hz equivalents. Later on combined data, using
> > data from early stage of the processing to create higher level data, such
> as
> > using voltage data to construct concentration data applying some
> equations
> > etc...
> >
> > Hope it is more clear now.
> >
> >>
> >> Cheers,
> >>
> >> Brian
> >>
> >> On Sat, Oct 17, 2009 at 9:17 PM, Gökhan Sever <gokhansever@gmail.com>
> >> wrote:
> >>>
> >>>
> >>> On Sat, Oct 17, 2009 at 10:58 PM, Brian Granger <ellisonbg.net@
> gmail.com>
> >>> wrote:
> >>>>
> >>>>
> >>>> On Sat, Oct 17, 2009 at 5:41 PM, Gökhan Sever <gokhansever@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> I want to experiment IPython's parallel computing functionality. This
> >>>>> far I couldn't progress much because ipcluster instantiation stalls
> giving
> >>>>> the following messages without dropping me into the main IPython
> shell.
> >>>>>
> >>>>> My intention is parallelise a small Python script that calls an
> >>>>> external set of scripts that process the dataset I have in-hand. It
> is not a
> >>>>> huge computing power demanding task but in my Intel 2.5Ghz Dual Core
> 2 it
> >>>>> takes about 1.5 hours to process the whole dataset. Looking at the
> system
> >>>>> monitor I see that the workload is not equally distributed in between
> CPUs
> >>>>> (one of them usually much lazier than the other.) I am sure
> parallezing the
> >>>>> code run would boost the processing speed. In my dataset I have 17
> folders
> >>>>> and each folder is independent from each other. My script visits each
> folder
> >>>>> and calls the main external script via subprocess module's call
> function.
> >>>>> Processing starts with the first folder, and doesn't work on the next
> folder
> >>>>> unless the processing finishes with the previous folder. Basically,
> what I
> >>>>> really want is to put externally called scripts into separate
> threads, so
> >>>>> that I don't need to wait the previous job to be done during the
> processing
> >>>>> process.
> >>>>>
> >>>>> From the IPython parallel computing documentation, it seems like what
> I
> >>>>> want is doable in IPython. However I need some advice whether my
> >>>>> understanding is correct in this aspect. Also for the solution of the
> below
> >>>>> warning messages.
> >>>>>
> >>>>
> >>>> Yes, I think it would work just fine for that.  If you have the names
> of
> >>>> the folders and a function that will compute what you want, given the
> name
> >>>> of the folder, you should be able to just use MultiEngineClient.map
> >>>
> >>> This is the script in hand that I want to parallelize:
> >>>
> >>>
> >>> import os
> >>> from subprocess import call
> >>>
> >>> init = os.getcwd()
> >>>
> >>> for root, dirs, files in os.walk('.'):
> >>>     dirs.sort()
> >>>     for file in files:
> >>>         if file.endswith('.sea') == True:
> >>>             print file
> >>>             os.chdir(root)
> >>>             print os.getcwd()
> >>>             call(['postprocessing_saudi', file])
> >>>             os.chdir(init)
> >>>
> >>> From the top of the dataset folder hierarchy I call this script, and
> >>> whenever a "sea" ended file encountered it executes set of external
> scripts
> >>> starting with postprocessing_saudi bash script. And goes on with IDL,
> perl,
> >>> python scripts till it finishes processing of that "sea" file and so on
> so
> >>> forth till the directories exhaust.
> >>>
> >>> If I can make parallel functionality working, will I need to make any
> >>> changes in this code? If not could you be little more descriptive on
> the use
> >>> of MultiEngineClient.map
> >>>
> >>> Thanks for your comments.
> >>>
> >>>
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Brian
> >>>>
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>>
> >>>>> [gsever@ccn Desktop]$  ipcluster local -n 4
> >>>>>
> >>>>>
> /usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
> module
> >>>>> instead
> >>>>>   import sha
> >>>>>
> >>>>>
> /usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
> >>>>> DeprecationWarning: the sets module is deprecated
> >>>>> 2009-10-17 18:59:37-0500 [-] Log opened.
> >>>>> 2009-10-17 18:59:37-0500 [-] Process ['ipcontroller',
> >>>>> '--logfile=/home/gsever/.ipython/log/ipcontroller'] has started with
> >>>>> pid=11066
> >>>>> 2009-10-17 18:59:37-0500 [-] Waiting for controller to finish
> >>>>> starting...
> >>>>> 2009-10-17 18:59:38-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
> module
> >>>>> instead\n  import sha\n'
> >>>>> 2009-10-17 18:59:38-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
> >>>>> DeprecationWarning: the sets module is deprecated\n'
> >>>>> 2009-10-17 18:59:39-0500 [-] Controller started
> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
> with
> >>>>> pid=11067
> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
> with
> >>>>> pid=11068
> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
> with
> >>>>> pid=11069
> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
> with
> >>>>> pid=11070
> >>>>> 2009-10-17 18:59:39-0500 [-] Engines started with pids: [11067,
> 11068,
> >>>>> 11069, 11070]
> >>>>> 2009-10-17 18:59:39-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
> module
> >>>>> instead\n  import sha\n'
> >>>>> 2009-10-17 18:59:39-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
> module
> >>>>> instead\n  import sha\n'
> >>>>> 2009-10-17 18:59:39-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
> >>>>> DeprecationWarning: the sets module is deprecated\n'
> >>>>> 2009-10-17 18:59:40-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
> >>>>> DeprecationWarning: the sets module is deprecated\n'
> >>>>> 2009-10-17 18:59:40-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
> module
> >>>>> instead\n  import sha\n'
> >>>>> 2009-10-17 18:59:40-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
> module
> >>>>> instead\n  import sha\n'
> >>>>> 2009-10-17 18:59:40-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
> >>>>> DeprecationWarning: the sets module is deprecated\n'
> >>>>> 2009-10-17 18:59:40-0500 [-]
> >>>>>
> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
> >>>>> DeprecationWarning: the sets module is deprecated\n'
> >>>>>
> >>>>>
> >>>>> Here is my system info:
> >>>>>
> >>>>>
> ================================================================================
> >>>>> Platform     :
> >>>>> Linux-2.6.29.6-217.2.3.fc11.i686.PAE-i686-with-fedora-11-Leonidas
> >>>>> Python       : ('CPython', 'tags/r26', '66714')
> >>>>> IPython      : 0.10
> >>>>> NumPy      : 1.4.0.dev
> >>>>>
> >>>>>
> ================================================================================
> >>>>>
> >>>>> --
> >>>>> Gökhan
> >>>>>
> >>>>> _______________________________________________
> >>>>> IPython-user mailing list
> >>>>> IPython-user@scipy.org
> >>>>> http://mail.scipy.org/mailman/listinfo/ipython-user
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Gökhan
> >>
> >
> >
> >
> > --
> > Gökhan
> >
> > _______________________________________________
> > IPython-user mailing list
> > IPython-user@scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-user
> >
> >
>



-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20091018/aabe4eb8/attachment-0001.html 


More information about the IPython-user mailing list