[IPython-user] Cannot start ipcluster

Brian Granger ellisonbg.net@gmail....
Sun Oct 18 01:31:18 CDT 2009


> Thanks for the suggestion. multiprocessing module was also in my mind to
> experiment with alternative to IPython's approach. Actually, my original
> intention is to rebase the whole original code-base to Python. Freeing the
> source code from IDL license restriction, unifying under one umbrella
> (instead of using multi languages --IDL, C, Perl, Bash, Csh etc... More
> importantly much lowering the amount of code. The core processing code
> totals probably more than 25-30k code in the repo. I am assuming this would
> bring down to 5-6k levels and employing some parallel processing techniques
> and object orientalism. However this is not an easy take, and I need papers
> to read and posters to prepare :)
>
>
Yes, multiprocessing might be a very good option for you.  Especially if you
just have a multicore workstation.  But, for basic map style parallelism,
either should be very easy.  Currently, on a multicore CPU where
multiprocessing uses fork, it probably has a much lower overhead than
Ipython.  But, given the fairly coarse grain of what you are doing I am not
sure it would make a difference.

There are a number of usage cases where IPython excels though:

* If you want to scale up and run on a cluster that has a batch system, etc.
* If you want to use it interactively from within IPython.
* If you want need to pass data between processes (using MPI).  IPython has
great integration with MPI through mpi4py.
* If you really need robust exception propagation.

Cheers,

Brian




> If you are curious about the project please take a look at its SourceForge
> entry at: http://sourceforge.net/projects/adpaa/
>
> and this is for source code analysis:
> https://www.ohloh.net/p/adpaa/analyses/latest
>
> Besides most of the processing code is written in IDL, also there is a GUI
> written using IDL. For some reason my local IDL license doesn't work with
> Fedora 11 and my Cisco VPN client kills my regular internet access. I can
> access to license server this way, but then lose my net access :)
>
>
>
>
>>
>> On Sun, Oct 18, 2009 at 1:15 AM, Gökhan Sever <gokhansever@gmail.com>
>> wrote:
>> >
>> >
>> > On Sat, Oct 17, 2009 at 11:56 PM, Brian Granger <ellisonbg.net@
>> gmail.com>
>> > wrote:
>> >>
>> >> Does each .sea file take the same amount of time?
>> >
>> > No, depends on the file size and content, each processing takes
>> different
>> > amount of time.
>> >
>> >>
>> >> How many .sea files are there in total?
>> >
>> > 17 folders so in total 17 sea files. Actually the example dataset I am
>> using
>> > here is just a small subset of the original dataset. There may be more
>> than
>> > couple hundred folders lying inside the main archive. For now my main
>> > intention is only to parallelize (why my gmail shows this word with a
>> red
>> > underlined fashion, google itself suggest this as right but gmail warns
>> :))
>> > the small subset.
>> >
>> >>
>> >> How long for each .sea file?
>> >
>> > 3 to 5 minutes, again depends on the file size and content.
>> >
>> >>
>> >> What is the result?  A new file?
>> >
>> > Bunch of new files (50 to 100). For instance, a 60 MB sea file produces
>> > ~430MB data when it is processed mostly ASCII but there are Binary files
>> > outputted as well. Out of 430 MB there are also down-sampled data
>> created
>> > using 25 Hz data down to 1 Hz equivalents. Later on combined data, using
>> > data from early stage of the processing to create higher level data,
>> such as
>> > using voltage data to construct concentration data applying some
>> equations
>> > etc...
>> >
>> > Hope it is more clear now.
>> >
>> >>
>> >> Cheers,
>> >>
>> >> Brian
>> >>
>> >> On Sat, Oct 17, 2009 at 9:17 PM, Gökhan Sever <gokhansever@gmail.com>
>> >> wrote:
>> >>>
>> >>>
>> >>> On Sat, Oct 17, 2009 at 10:58 PM, Brian Granger <ellisonbg.net@
>> gmail.com>
>> >>> wrote:
>> >>>>
>> >>>>
>> >>>> On Sat, Oct 17, 2009 at 5:41 PM, Gökhan Sever <gokhansever@gmail.com
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>> Hello,
>> >>>>>
>> >>>>> I want to experiment IPython's parallel computing functionality.
>> This
>> >>>>> far I couldn't progress much because ipcluster instantiation stalls
>> giving
>> >>>>> the following messages without dropping me into the main IPython
>> shell.
>> >>>>>
>> >>>>> My intention is parallelise a small Python script that calls an
>> >>>>> external set of scripts that process the dataset I have in-hand. It
>> is not a
>> >>>>> huge computing power demanding task but in my Intel 2.5Ghz Dual Core
>> 2 it
>> >>>>> takes about 1.5 hours to process the whole dataset. Looking at the
>> system
>> >>>>> monitor I see that the workload is not equally distributed in
>> between CPUs
>> >>>>> (one of them usually much lazier than the other.) I am sure
>> parallezing the
>> >>>>> code run would boost the processing speed. In my dataset I have 17
>> folders
>> >>>>> and each folder is independent from each other. My script visits
>> each folder
>> >>>>> and calls the main external script via subprocess module's call
>> function.
>> >>>>> Processing starts with the first folder, and doesn't work on the
>> next folder
>> >>>>> unless the processing finishes with the previous folder. Basically,
>> what I
>> >>>>> really want is to put externally called scripts into separate
>> threads, so
>> >>>>> that I don't need to wait the previous job to be done during the
>> processing
>> >>>>> process.
>> >>>>>
>> >>>>> From the IPython parallel computing documentation, it seems like
>> what I
>> >>>>> want is doable in IPython. However I need some advice whether my
>> >>>>> understanding is correct in this aspect. Also for the solution of
>> the below
>> >>>>> warning messages.
>> >>>>>
>> >>>>
>> >>>> Yes, I think it would work just fine for that.  If you have the names
>> of
>> >>>> the folders and a function that will compute what you want, given the
>> name
>> >>>> of the folder, you should be able to just use MultiEngineClient.map
>> >>>
>> >>> This is the script in hand that I want to parallelize:
>> >>>
>> >>>
>> >>> import os
>> >>> from subprocess import call
>> >>>
>> >>> init = os.getcwd()
>> >>>
>> >>> for root, dirs, files in os.walk('.'):
>> >>>     dirs.sort()
>> >>>     for file in files:
>> >>>         if file.endswith('.sea') == True:
>> >>>             print file
>> >>>             os.chdir(root)
>> >>>             print os.getcwd()
>> >>>             call(['postprocessing_saudi', file])
>> >>>             os.chdir(init)
>> >>>
>> >>> From the top of the dataset folder hierarchy I call this script, and
>> >>> whenever a "sea" ended file encountered it executes set of external
>> scripts
>> >>> starting with postprocessing_saudi bash script. And goes on with IDL,
>> perl,
>> >>> python scripts till it finishes processing of that "sea" file and so
>> on so
>> >>> forth till the directories exhaust.
>> >>>
>> >>> If I can make parallel functionality working, will I need to make any
>> >>> changes in this code? If not could you be little more descriptive on
>> the use
>> >>> of MultiEngineClient.map
>> >>>
>> >>> Thanks for your comments.
>> >>>
>> >>>
>> >>>>
>> >>>> Cheers,
>> >>>>
>> >>>> Brian
>> >>>>
>> >>>>>
>> >>>>> Thanks.
>> >>>>>
>> >>>>>
>> >>>>> [gsever@ccn Desktop]$  ipcluster local -n 4
>> >>>>>
>> >>>>>
>> /usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>> module
>> >>>>> instead
>> >>>>>   import sha
>> >>>>>
>> >>>>>
>> /usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>> >>>>> DeprecationWarning: the sets module is deprecated
>> >>>>> 2009-10-17 18:59:37-0500 [-] Log opened.
>> >>>>> 2009-10-17 18:59:37-0500 [-] Process ['ipcontroller',
>> >>>>> '--logfile=/home/gsever/.ipython/log/ipcontroller'] has started with
>> >>>>> pid=11066
>> >>>>> 2009-10-17 18:59:37-0500 [-] Waiting for controller to finish
>> >>>>> starting...
>> >>>>> 2009-10-17 18:59:38-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>> module
>> >>>>> instead\n  import sha\n'
>> >>>>> 2009-10-17 18:59:38-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>> >>>>> 2009-10-17 18:59:39-0500 [-] Controller started
>> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
>> with
>> >>>>> pid=11067
>> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
>> with
>> >>>>> pid=11068
>> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
>> with
>> >>>>> pid=11069
>> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
>> with
>> >>>>> pid=11070
>> >>>>> 2009-10-17 18:59:39-0500 [-] Engines started with pids: [11067,
>> 11068,
>> >>>>> 11069, 11070]
>> >>>>> 2009-10-17 18:59:39-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>> module
>> >>>>> instead\n  import sha\n'
>> >>>>> 2009-10-17 18:59:39-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>> module
>> >>>>> instead\n  import sha\n'
>> >>>>> 2009-10-17 18:59:39-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>> >>>>> 2009-10-17 18:59:40-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>> >>>>> 2009-10-17 18:59:40-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>> module
>> >>>>> instead\n  import sha\n'
>> >>>>> 2009-10-17 18:59:40-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>> module
>> >>>>> instead\n  import sha\n'
>> >>>>> 2009-10-17 18:59:40-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>> >>>>> 2009-10-17 18:59:40-0500 [-]
>> >>>>>
>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>> >>>>>
>> >>>>>
>> >>>>> Here is my system info:
>> >>>>>
>> >>>>>
>> ================================================================================
>> >>>>> Platform     :
>> >>>>> Linux-2.6.29.6-217.2.3.fc11.i686.PAE-i686-with-fedora-11-Leonidas
>> >>>>> Python       : ('CPython', 'tags/r26', '66714')
>> >>>>> IPython      : 0.10
>> >>>>> NumPy      : 1.4.0.dev
>> >>>>>
>> >>>>>
>> ================================================================================
>> >>>>>
>> >>>>> --
>> >>>>> Gökhan
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> IPython-user mailing list
>> >>>>> IPython-user@scipy.org
>> >>>>> http://mail.scipy.org/mailman/listinfo/ipython-user
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Gökhan
>> >>
>> >
>> >
>> >
>> > --
>> > Gökhan
>> >
>> > _______________________________________________
>> > IPython-user mailing list
>> > IPython-user@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/ipython-user
>> >
>> >
>>
>
>
>
> --
> Gökhan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20091017/13b8ad5e/attachment-0001.html 


More information about the IPython-user mailing list