[IPython-user] Cannot start ipcluster

Gökhan Sever gokhansever@gmail....
Sun Oct 18 01:11:07 CDT 2009


On Sun, Oct 18, 2009 at 1:00 AM, Brian Granger <ellisonbg.net@gmail.com>wrote:

> If the files take different amounts of time you want load balancing.  That
> is provided by the TaskClient:
>
> def find_sea_files():
>     # return a list of sea files
>
> def compute_sea_files(path, sea_file):
>     # do all the computations for a single sea file
>
> from IPython.kernel.client import TaskClient
>
> tc = TaskClient()
>
> path =
> files = find_sea_files()
>
> tc.map(compute_sea_files, files)
>
> This will block until everything is done.  You may have to play around with
> the path stuff, but this should give you the basic idea.
>
> To test it, I would simply write a version that uses Python's builtin map
> function.  tc.map works the same way, but it is parallel and load balanced.
>
> ...now if you could just get things started...
>
> Brian


Are these valid even if I can't start Ipython with ipcluster local -n 4?


>
>
> On Sat, Oct 17, 2009 at 10:15 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>
>>
>>
>> On Sat, Oct 17, 2009 at 11:56 PM, Brian Granger <ellisonbg.net@gmail.com>wrote:
>>
>>> Does each .sea file take the same amount of time?
>>>
>>
>> No, depends on the file size and content, each processing takes different
>> amount of time.
>>
>>
>>> How many .sea files are there in total?
>>>
>>
>> 17 folders so in total 17 sea files. Actually the example dataset I am
>> using here is just a small subset of the original dataset. There may be more
>> than couple hundred folders lying inside the main archive. For now my main
>> intention is only to parallelize (why my gmail shows this word with a red
>> underlined fashion, google itself suggest this as right but gmail warns :))
>> the small subset.
>>
>>
>>> How long for each .sea file?
>>>
>>
>> 3 to 5 minutes, again depends on the file size and content.
>>
>>
>>> What is the result?  A new file?
>>>
>>
>> Bunch of new files (50 to 100). For instance, a 60 MB sea file produces
>> ~430MB data when it is processed mostly ASCII but there are Binary files
>> outputted as well. Out of 430 MB there are also down-sampled data created
>> using 25 Hz data down to 1 Hz equivalents. Later on combined data, using
>> data from early stage of the processing to create higher level data, such as
>> using voltage data to construct concentration data applying some equations
>> etc...
>>
>> Hope it is more clear now.
>>
>>
>>>
>>> Cheers,
>>>
>>> Brian
>>>
>>>
>>> On Sat, Oct 17, 2009 at 9:17 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>>>
>>>>
>>>>
>>>> On Sat, Oct 17, 2009 at 10:58 PM, Brian Granger <ellisonbg.net@
>>>> gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sat, Oct 17, 2009 at 5:41 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I want to experiment IPython's parallel computing functionality. This
>>>>>> far I couldn't progress much because ipcluster instantiation stalls giving
>>>>>> the following messages without dropping me into the main IPython shell.
>>>>>>
>>>>>> My intention is parallelise a small Python script that calls an
>>>>>> external set of scripts that process the dataset I have in-hand. It is not a
>>>>>> huge computing power demanding task but in my Intel 2.5Ghz Dual Core 2 it
>>>>>> takes about 1.5 hours to process the whole dataset. Looking at the system
>>>>>> monitor I see that the workload is not equally distributed in between CPUs
>>>>>> (one of them usually much lazier than the other.) I am sure parallezing the
>>>>>> code run would boost the processing speed. In my dataset I have 17 folders
>>>>>> and each folder is independent from each other. My script visits each folder
>>>>>> and calls the main external script via subprocess module's call function.
>>>>>> Processing starts with the first folder, and doesn't work on the next folder
>>>>>> unless the processing finishes with the previous folder. Basically, what I
>>>>>> really want is to put externally called scripts into separate threads, so
>>>>>> that I don't need to wait the previous job to be done during the processing
>>>>>> process.
>>>>>>
>>>>>> From the IPython parallel computing documentation, it seems like what
>>>>>> I want is doable in IPython. However I need some advice whether my
>>>>>> understanding is correct in this aspect. Also for the solution of the below
>>>>>> warning messages.
>>>>>>
>>>>>>
>>>>> Yes, I think it would work just fine for that.  If you have the names
>>>>> of the folders and a function that will compute what you want, given the
>>>>> name of the folder, you should be able to just use MultiEngineClient.map
>>>>>
>>>>
>>>> This is the script in hand that I want to parallelize:
>>>>
>>>>
>>>> import os
>>>> from subprocess import call
>>>>
>>>> init = os.getcwd()
>>>>
>>>> for root, dirs, files in os.walk('.'):
>>>>     dirs.sort()
>>>>     for file in files:
>>>>         if file.endswith('.sea') == True:
>>>>             print file
>>>>             os.chdir(root)
>>>>             print os.getcwd()
>>>>             call(['postprocessing_saudi', file])
>>>>             os.chdir(init)
>>>>
>>>> From the top of the dataset folder hierarchy I call this script, and
>>>> whenever a "sea" ended file encountered it executes set of external scripts
>>>> starting with postprocessing_saudi bash script. And goes on with IDL, perl,
>>>> python scripts till it finishes processing of that "sea" file and so on so
>>>> forth till the directories exhaust.
>>>>
>>>> If I can make parallel functionality working, will I need to make any
>>>> changes in this code? If not could you be little more descriptive on the use
>>>> of MultiEngineClient.map
>>>>
>>>> Thanks for your comments.
>>>>
>>>>
>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Brian
>>>>>
>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> [gsever@ccn Desktop]$  ipcluster local -n 4
>>>>>> /usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>>> instead
>>>>>>   import sha
>>>>>> /usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>>> DeprecationWarning: the sets module is deprecated
>>>>>> 2009-10-17 18:59:37-0500 [-] Log opened.
>>>>>> 2009-10-17 18:59:37-0500 [-] Process ['ipcontroller',
>>>>>> '--logfile=/home/gsever/.ipython/log/ipcontroller'] has started with
>>>>>> pid=11066
>>>>>> 2009-10-17 18:59:37-0500 [-] Waiting for controller to finish
>>>>>> starting...
>>>>>> 2009-10-17 18:59:38-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>>> instead\n  import sha\n'
>>>>>> 2009-10-17 18:59:38-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>>> 2009-10-17 18:59:39-0500 [-] Controller started
>>>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>>>>> pid=11067
>>>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>>>>> pid=11068
>>>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>>>>> pid=11069
>>>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>>>>> pid=11070
>>>>>> 2009-10-17 18:59:39-0500 [-] Engines started with pids: [11067, 11068,
>>>>>> 11069, 11070]
>>>>>> 2009-10-17 18:59:39-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>>> instead\n  import sha\n'
>>>>>> 2009-10-17 18:59:39-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>>> instead\n  import sha\n'
>>>>>> 2009-10-17 18:59:39-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>>> instead\n  import sha\n'
>>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>>> instead\n  import sha\n'
>>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>>>
>>>>>>
>>>>>> Here is my system info:
>>>>>>
>>>>>> ================================================================================
>>>>>> Platform     :
>>>>>> Linux-2.6.29.6-217.2.3.fc11.i686.PAE-i686-with-fedora-11-Leonidas
>>>>>> Python       : ('CPython', 'tags/r26', '66714')
>>>>>> IPython      : 0.10
>>>>>> NumPy      : 1.4.0.dev
>>>>>>
>>>>>> ================================================================================
>>>>>>
>>>>>> --
>>>>>> Gökhan
>>>>>>
>>>>>> _______________________________________________
>>>>>> IPython-user mailing list
>>>>>> IPython-user@scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Gökhan
>>>>
>>>
>>>
>>
>>
>> --
>> Gökhan
>>
>
>


-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20091018/1014f5a1/attachment.html 


More information about the IPython-user mailing list