[IPython-user] Cannot start ipcluster

Gökhan Sever gokhansever@gmail....
Sun Oct 18 01:55:22 CDT 2009


On Sun, Oct 18, 2009 at 1:31 AM, Brian Granger <ellisonbg.net@gmail.com>wrote:

>
>
>> Thanks for the suggestion. multiprocessing module was also in my mind to
>> experiment with alternative to IPython's approach. Actually, my original
>> intention is to rebase the whole original code-base to Python. Freeing the
>> source code from IDL license restriction, unifying under one umbrella
>> (instead of using multi languages --IDL, C, Perl, Bash, Csh etc... More
>> importantly much lowering the amount of code. The core processing code
>> totals probably more than 25-30k code in the repo. I am assuming this would
>> bring down to 5-6k levels and employing some parallel processing techniques
>> and object orientalism. However this is not an easy take, and I need papers
>> to read and posters to prepare :)
>>
>>
> Yes, multiprocessing might be a very good option for you.  Especially if
> you just have a multicore workstation.  But, for basic map style
> parallelism, either should be very easy.  Currently, on a multicore CPU
> where multiprocessing uses fork, it probably has a much lower overhead than
> Ipython.  But, given the fairly coarse grain of what you are doing I am not
> sure it would make a difference.
>


I have more patience to carry out this task with IPython :)


>
> There are a number of usage cases where IPython excels though:
>
> * If you want to scale up and run on a cluster that has a batch system,
> etc.
> * If you want to use it interactively from within IPython.
> * If you want need to pass data between processes (using MPI).  IPython has
> great integration with MPI through mpi4py.
> * If you really need robust exception propagation.
>
> Cheers,
>
> Brian
>
>
>
>
>> If you are curious about the project please take a look at its SourceForge
>> entry at: http://sourceforge.net/projects/adpaa/
>>
>> and this is for source code analysis:
>> https://www.ohloh.net/p/adpaa/analyses/latest
>>
>> Besides most of the processing code is written in IDL, also there is a GUI
>> written using IDL. For some reason my local IDL license doesn't work with
>> Fedora 11 and my Cisco VPN client kills my regular internet access. I can
>> access to license server this way, but then lose my net access :)
>>
>>
>>
>>
>>>
>>> On Sun, Oct 18, 2009 at 1:15 AM, Gökhan Sever <gokhansever@gmail.com>
>>> wrote:
>>> >
>>> >
>>> > On Sat, Oct 17, 2009 at 11:56 PM, Brian Granger <ellisonbg.net@
>>> gmail.com>
>>> > wrote:
>>> >>
>>> >> Does each .sea file take the same amount of time?
>>> >
>>> > No, depends on the file size and content, each processing takes
>>> different
>>> > amount of time.
>>> >
>>> >>
>>> >> How many .sea files are there in total?
>>> >
>>> > 17 folders so in total 17 sea files. Actually the example dataset I am
>>> using
>>> > here is just a small subset of the original dataset. There may be more
>>> than
>>> > couple hundred folders lying inside the main archive. For now my main
>>> > intention is only to parallelize (why my gmail shows this word with a
>>> red
>>> > underlined fashion, google itself suggest this as right but gmail warns
>>> :))
>>> > the small subset.
>>> >
>>> >>
>>> >> How long for each .sea file?
>>> >
>>> > 3 to 5 minutes, again depends on the file size and content.
>>> >
>>> >>
>>> >> What is the result?  A new file?
>>> >
>>> > Bunch of new files (50 to 100). For instance, a 60 MB sea file produces
>>> > ~430MB data when it is processed mostly ASCII but there are Binary
>>> files
>>> > outputted as well. Out of 430 MB there are also down-sampled data
>>> created
>>> > using 25 Hz data down to 1 Hz equivalents. Later on combined data,
>>> using
>>> > data from early stage of the processing to create higher level data,
>>> such as
>>> > using voltage data to construct concentration data applying some
>>> equations
>>> > etc...
>>> >
>>> > Hope it is more clear now.
>>> >
>>> >>
>>> >> Cheers,
>>> >>
>>> >> Brian
>>> >>
>>> >> On Sat, Oct 17, 2009 at 9:17 PM, Gökhan Sever <gokhansever@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>>
>>> >>> On Sat, Oct 17, 2009 at 10:58 PM, Brian Granger <ellisonbg.net@
>>> gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>>
>>> >>>> On Sat, Oct 17, 2009 at 5:41 PM, Gökhan Sever <
>>> gokhansever@gmail.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Hello,
>>> >>>>>
>>> >>>>> I want to experiment IPython's parallel computing functionality.
>>> This
>>> >>>>> far I couldn't progress much because ipcluster instantiation stalls
>>> giving
>>> >>>>> the following messages without dropping me into the main IPython
>>> shell.
>>> >>>>>
>>> >>>>> My intention is parallelise a small Python script that calls an
>>> >>>>> external set of scripts that process the dataset I have in-hand. It
>>> is not a
>>> >>>>> huge computing power demanding task but in my Intel 2.5Ghz Dual
>>> Core 2 it
>>> >>>>> takes about 1.5 hours to process the whole dataset. Looking at the
>>> system
>>> >>>>> monitor I see that the workload is not equally distributed in
>>> between CPUs
>>> >>>>> (one of them usually much lazier than the other.) I am sure
>>> parallezing the
>>> >>>>> code run would boost the processing speed. In my dataset I have 17
>>> folders
>>> >>>>> and each folder is independent from each other. My script visits
>>> each folder
>>> >>>>> and calls the main external script via subprocess module's call
>>> function.
>>> >>>>> Processing starts with the first folder, and doesn't work on the
>>> next folder
>>> >>>>> unless the processing finishes with the previous folder. Basically,
>>> what I
>>> >>>>> really want is to put externally called scripts into separate
>>> threads, so
>>> >>>>> that I don't need to wait the previous job to be done during the
>>> processing
>>> >>>>> process.
>>> >>>>>
>>> >>>>> From the IPython parallel computing documentation, it seems like
>>> what I
>>> >>>>> want is doable in IPython. However I need some advice whether my
>>> >>>>> understanding is correct in this aspect. Also for the solution of
>>> the below
>>> >>>>> warning messages.
>>> >>>>>
>>> >>>>
>>> >>>> Yes, I think it would work just fine for that.  If you have the
>>> names of
>>> >>>> the folders and a function that will compute what you want, given
>>> the name
>>> >>>> of the folder, you should be able to just use MultiEngineClient.map
>>> >>>
>>> >>> This is the script in hand that I want to parallelize:
>>> >>>
>>> >>>
>>> >>> import os
>>> >>> from subprocess import call
>>> >>>
>>> >>> init = os.getcwd()
>>> >>>
>>> >>> for root, dirs, files in os.walk('.'):
>>> >>>     dirs.sort()
>>> >>>     for file in files:
>>> >>>         if file.endswith('.sea') == True:
>>> >>>             print file
>>> >>>             os.chdir(root)
>>> >>>             print os.getcwd()
>>> >>>             call(['postprocessing_saudi', file])
>>> >>>             os.chdir(init)
>>> >>>
>>> >>> From the top of the dataset folder hierarchy I call this script, and
>>> >>> whenever a "sea" ended file encountered it executes set of external
>>> scripts
>>> >>> starting with postprocessing_saudi bash script. And goes on with IDL,
>>> perl,
>>> >>> python scripts till it finishes processing of that "sea" file and so
>>> on so
>>> >>> forth till the directories exhaust.
>>> >>>
>>> >>> If I can make parallel functionality working, will I need to make any
>>> >>> changes in this code? If not could you be little more descriptive on
>>> the use
>>> >>> of MultiEngineClient.map
>>> >>>
>>> >>> Thanks for your comments.
>>> >>>
>>> >>>
>>> >>>>
>>> >>>> Cheers,
>>> >>>>
>>> >>>> Brian
>>> >>>>
>>> >>>>>
>>> >>>>> Thanks.
>>> >>>>>
>>> >>>>>
>>> >>>>> [gsever@ccn Desktop]$  ipcluster local -n 4
>>> >>>>>
>>> >>>>>
>>> /usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>>> module
>>> >>>>> instead
>>> >>>>>   import sha
>>> >>>>>
>>> >>>>>
>>> /usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> >>>>> DeprecationWarning: the sets module is deprecated
>>> >>>>> 2009-10-17 18:59:37-0500 [-] Log opened.
>>> >>>>> 2009-10-17 18:59:37-0500 [-] Process ['ipcontroller',
>>> >>>>> '--logfile=/home/gsever/.ipython/log/ipcontroller'] has started
>>> with
>>> >>>>> pid=11066
>>> >>>>> 2009-10-17 18:59:37-0500 [-] Waiting for controller to finish
>>> >>>>> starting...
>>> >>>>> 2009-10-17 18:59:38-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>>> module
>>> >>>>> instead\n  import sha\n'
>>> >>>>> 2009-10-17 18:59:38-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>>> >>>>> 2009-10-17 18:59:39-0500 [-] Controller started
>>> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
>>> with
>>> >>>>> pid=11067
>>> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
>>> with
>>> >>>>> pid=11068
>>> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
>>> with
>>> >>>>> pid=11069
>>> >>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>> >>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started
>>> with
>>> >>>>> pid=11070
>>> >>>>> 2009-10-17 18:59:39-0500 [-] Engines started with pids: [11067,
>>> 11068,
>>> >>>>> 11069, 11070]
>>> >>>>> 2009-10-17 18:59:39-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>>> module
>>> >>>>> instead\n  import sha\n'
>>> >>>>> 2009-10-17 18:59:39-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>>> module
>>> >>>>> instead\n  import sha\n'
>>> >>>>> 2009-10-17 18:59:39-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>>> >>>>> 2009-10-17 18:59:40-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>>> >>>>> 2009-10-17 18:59:40-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>>> module
>>> >>>>> instead\n  import sha\n'
>>> >>>>> 2009-10-17 18:59:40-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> >>>>> DeprecationWarning: the sha module is deprecated; use the hashlib
>>> module
>>> >>>>> instead\n  import sha\n'
>>> >>>>> 2009-10-17 18:59:40-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>>> >>>>> 2009-10-17 18:59:40-0500 [-]
>>> >>>>>
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> >>>>> DeprecationWarning: the sets module is deprecated\n'
>>> >>>>>
>>> >>>>>
>>> >>>>> Here is my system info:
>>> >>>>>
>>> >>>>>
>>> ================================================================================
>>> >>>>> Platform     :
>>> >>>>> Linux-2.6.29.6-217.2.3.fc11.i686.PAE-i686-with-fedora-11-Leonidas
>>> >>>>> Python       : ('CPython', 'tags/r26', '66714')
>>> >>>>> IPython      : 0.10
>>> >>>>> NumPy      : 1.4.0.dev
>>> >>>>>
>>> >>>>>
>>> ================================================================================
>>> >>>>>
>>> >>>>> --
>>> >>>>> Gökhan
>>> >>>>>
>>> >>>>> _______________________________________________
>>> >>>>> IPython-user mailing list
>>> >>>>> IPython-user@scipy.org
>>> >>>>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Gökhan
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Gökhan
>>> >
>>> > _______________________________________________
>>> > IPython-user mailing list
>>> > IPython-user@scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/ipython-user
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Gökhan
>>
>
>


-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20091018/59cd7b2e/attachment-0001.html 


More information about the IPython-user mailing list