[IPython-user] Cannot start ipcluster

Brian Granger ellisonbg.net@gmail....
Sat Oct 17 23:56:29 CDT 2009


Does each .sea file take the same amount of time?
How many .sea files are there in total?
How long for each .sea file?
What is the result?  A new file?

Cheers,

Brian

On Sat, Oct 17, 2009 at 9:17 PM, Gökhan Sever <gokhansever@gmail.com> wrote:

>
>
> On Sat, Oct 17, 2009 at 10:58 PM, Brian Granger <ellisonbg.net@gmail.com>wrote:
>
>>
>>
>> On Sat, Oct 17, 2009 at 5:41 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I want to experiment IPython's parallel computing functionality. This far
>>> I couldn't progress much because ipcluster instantiation stalls giving the
>>> following messages without dropping me into the main IPython shell.
>>>
>>> My intention is parallelise a small Python script that calls an external
>>> set of scripts that process the dataset I have in-hand. It is not a huge
>>> computing power demanding task but in my Intel 2.5Ghz Dual Core 2 it takes
>>> about 1.5 hours to process the whole dataset. Looking at the system monitor
>>> I see that the workload is not equally distributed in between CPUs (one of
>>> them usually much lazier than the other.) I am sure parallezing the code run
>>> would boost the processing speed. In my dataset I have 17 folders and each
>>> folder is independent from each other. My script visits each folder and
>>> calls the main external script via subprocess module's call function.
>>> Processing starts with the first folder, and doesn't work on the next folder
>>> unless the processing finishes with the previous folder. Basically, what I
>>> really want is to put externally called scripts into separate threads, so
>>> that I don't need to wait the previous job to be done during the processing
>>> process.
>>>
>>> From the IPython parallel computing documentation, it seems like what I
>>> want is doable in IPython. However I need some advice whether my
>>> understanding is correct in this aspect. Also for the solution of the below
>>> warning messages.
>>>
>>>
>> Yes, I think it would work just fine for that.  If you have the names of
>> the folders and a function that will compute what you want, given the name
>> of the folder, you should be able to just use MultiEngineClient.map
>>
>
> This is the script in hand that I want to parallelize:
>
>
> import os
> from subprocess import call
>
> init = os.getcwd()
>
> for root, dirs, files in os.walk('.'):
>     dirs.sort()
>     for file in files:
>         if file.endswith('.sea') == True:
>             print file
>             os.chdir(root)
>             print os.getcwd()
>             call(['postprocessing_saudi', file])
>             os.chdir(init)
>
> From the top of the dataset folder hierarchy I call this script, and
> whenever a "sea" ended file encountered it executes set of external scripts
> starting with postprocessing_saudi bash script. And goes on with IDL, perl,
> python scripts till it finishes processing of that "sea" file and so on so
> forth till the directories exhaust.
>
> If I can make parallel functionality working, will I need to make any
> changes in this code? If not could you be little more descriptive on the use
> of MultiEngineClient.map
>
> Thanks for your comments.
>
>
>
>>
>> Cheers,
>>
>> Brian
>>
>>
>>> Thanks.
>>>
>>>
>>> [gsever@ccn Desktop]$  ipcluster local -n 4
>>> /usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>> instead
>>>   import sha
>>> /usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> DeprecationWarning: the sets module is deprecated
>>> 2009-10-17 18:59:37-0500 [-] Log opened.
>>> 2009-10-17 18:59:37-0500 [-] Process ['ipcontroller',
>>> '--logfile=/home/gsever/.ipython/log/ipcontroller'] has started with
>>> pid=11066
>>> 2009-10-17 18:59:37-0500 [-] Waiting for controller to finish starting...
>>> 2009-10-17 18:59:38-0500 [-]
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>> instead\n  import sha\n'
>>> 2009-10-17 18:59:38-0500 [-]
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> DeprecationWarning: the sets module is deprecated\n'
>>> 2009-10-17 18:59:39-0500 [-] Controller started
>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>> pid=11067
>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>> pid=11068
>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>> pid=11069
>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>> pid=11070
>>> 2009-10-17 18:59:39-0500 [-] Engines started with pids: [11067, 11068,
>>> 11069, 11070]
>>> 2009-10-17 18:59:39-0500 [-]
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>> instead\n  import sha\n'
>>> 2009-10-17 18:59:39-0500 [-]
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>> instead\n  import sha\n'
>>> 2009-10-17 18:59:39-0500 [-]
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> DeprecationWarning: the sets module is deprecated\n'
>>> 2009-10-17 18:59:40-0500 [-]
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> DeprecationWarning: the sets module is deprecated\n'
>>> 2009-10-17 18:59:40-0500 [-]
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>> instead\n  import sha\n'
>>> 2009-10-17 18:59:40-0500 [-]
>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>> instead\n  import sha\n'
>>> 2009-10-17 18:59:40-0500 [-]
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> DeprecationWarning: the sets module is deprecated\n'
>>> 2009-10-17 18:59:40-0500 [-]
>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>> DeprecationWarning: the sets module is deprecated\n'
>>>
>>>
>>> Here is my system info:
>>>
>>> ================================================================================
>>> Platform     :
>>> Linux-2.6.29.6-217.2.3.fc11.i686.PAE-i686-with-fedora-11-Leonidas
>>> Python       : ('CPython', 'tags/r26', '66714')
>>> IPython      : 0.10
>>> NumPy      : 1.4.0.dev
>>>
>>> ================================================================================
>>>
>>> --
>>> Gökhan
>>>
>>> _______________________________________________
>>> IPython-user mailing list
>>> IPython-user@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>>
>>>
>>
>
>
> --
> Gökhan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20091017/25580deb/attachment-0001.html 


More information about the IPython-user mailing list