[IPython-user] Cannot start ipcluster

Kenneth Arnold kenneth.arnold@gmail....
Sun Oct 18 00:32:10 CDT 2009


ipython's distributed computing facilities are powerful, but can also
be confusing. You may find the builtin multiprocessing module easier
to get started with (see
http://docs.python.org/dev/library/multiprocessing.html#examples --
the one entitled "An [sic] showing how to use queues to feed tasks to
a collection of worker process"). This is a simple enough use case
that you should hopefully be able to avoid the bugs in the
multiprocessing module ;)

This reminds me that I had some questions and issues from back when I
was trying to use TaskClient -- I'll bring that up sometime. (In
short, good real-world examples, like how to deal with nontrivial code
and data, would be super-excellent.)

-Ken



On Sun, Oct 18, 2009 at 1:15 AM, Gökhan Sever <gokhansever@gmail.com> wrote:
>
>
> On Sat, Oct 17, 2009 at 11:56 PM, Brian Granger <ellisonbg.net@gmail.com>
> wrote:
>>
>> Does each .sea file take the same amount of time?
>
> No, depends on the file size and content, each processing takes different
> amount of time.
>
>>
>> How many .sea files are there in total?
>
> 17 folders so in total 17 sea files. Actually the example dataset I am using
> here is just a small subset of the original dataset. There may be more than
> couple hundred folders lying inside the main archive. For now my main
> intention is only to parallelize (why my gmail shows this word with a red
> underlined fashion, google itself suggest this as right but gmail warns :))
> the small subset.
>
>>
>> How long for each .sea file?
>
> 3 to 5 minutes, again depends on the file size and content.
>
>>
>> What is the result?  A new file?
>
> Bunch of new files (50 to 100). For instance, a 60 MB sea file produces
> ~430MB data when it is processed mostly ASCII but there are Binary files
> outputted as well. Out of 430 MB there are also down-sampled data created
> using 25 Hz data down to 1 Hz equivalents. Later on combined data, using
> data from early stage of the processing to create higher level data, such as
> using voltage data to construct concentration data applying some equations
> etc...
>
> Hope it is more clear now.
>
>>
>> Cheers,
>>
>> Brian
>>
>> On Sat, Oct 17, 2009 at 9:17 PM, Gökhan Sever <gokhansever@gmail.com>
>> wrote:
>>>
>>>
>>> On Sat, Oct 17, 2009 at 10:58 PM, Brian Granger <ellisonbg.net@gmail.com>
>>> wrote:
>>>>
>>>>
>>>> On Sat, Oct 17, 2009 at 5:41 PM, Gökhan Sever <gokhansever@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I want to experiment IPython's parallel computing functionality. This
>>>>> far I couldn't progress much because ipcluster instantiation stalls giving
>>>>> the following messages without dropping me into the main IPython shell.
>>>>>
>>>>> My intention is parallelise a small Python script that calls an
>>>>> external set of scripts that process the dataset I have in-hand. It is not a
>>>>> huge computing power demanding task but in my Intel 2.5Ghz Dual Core 2 it
>>>>> takes about 1.5 hours to process the whole dataset. Looking at the system
>>>>> monitor I see that the workload is not equally distributed in between CPUs
>>>>> (one of them usually much lazier than the other.) I am sure parallezing the
>>>>> code run would boost the processing speed. In my dataset I have 17 folders
>>>>> and each folder is independent from each other. My script visits each folder
>>>>> and calls the main external script via subprocess module's call function.
>>>>> Processing starts with the first folder, and doesn't work on the next folder
>>>>> unless the processing finishes with the previous folder. Basically, what I
>>>>> really want is to put externally called scripts into separate threads, so
>>>>> that I don't need to wait the previous job to be done during the processing
>>>>> process.
>>>>>
>>>>> From the IPython parallel computing documentation, it seems like what I
>>>>> want is doable in IPython. However I need some advice whether my
>>>>> understanding is correct in this aspect. Also for the solution of the below
>>>>> warning messages.
>>>>>
>>>>
>>>> Yes, I think it would work just fine for that.  If you have the names of
>>>> the folders and a function that will compute what you want, given the name
>>>> of the folder, you should be able to just use MultiEngineClient.map
>>>
>>> This is the script in hand that I want to parallelize:
>>>
>>>
>>> import os
>>> from subprocess import call
>>>
>>> init = os.getcwd()
>>>
>>> for root, dirs, files in os.walk('.'):
>>>     dirs.sort()
>>>     for file in files:
>>>         if file.endswith('.sea') == True:
>>>             print file
>>>             os.chdir(root)
>>>             print os.getcwd()
>>>             call(['postprocessing_saudi', file])
>>>             os.chdir(init)
>>>
>>> From the top of the dataset folder hierarchy I call this script, and
>>> whenever a "sea" ended file encountered it executes set of external scripts
>>> starting with postprocessing_saudi bash script. And goes on with IDL, perl,
>>> python scripts till it finishes processing of that "sea" file and so on so
>>> forth till the directories exhaust.
>>>
>>> If I can make parallel functionality working, will I need to make any
>>> changes in this code? If not could you be little more descriptive on the use
>>> of MultiEngineClient.map
>>>
>>> Thanks for your comments.
>>>
>>>
>>>>
>>>> Cheers,
>>>>
>>>> Brian
>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> [gsever@ccn Desktop]$  ipcluster local -n 4
>>>>>
>>>>> /usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>> instead
>>>>>   import sha
>>>>>
>>>>> /usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>> DeprecationWarning: the sets module is deprecated
>>>>> 2009-10-17 18:59:37-0500 [-] Log opened.
>>>>> 2009-10-17 18:59:37-0500 [-] Process ['ipcontroller',
>>>>> '--logfile=/home/gsever/.ipython/log/ipcontroller'] has started with
>>>>> pid=11066
>>>>> 2009-10-17 18:59:37-0500 [-] Waiting for controller to finish
>>>>> starting...
>>>>> 2009-10-17 18:59:38-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>> instead\n  import sha\n'
>>>>> 2009-10-17 18:59:38-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>> 2009-10-17 18:59:39-0500 [-] Controller started
>>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>>>> pid=11067
>>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>>>> pid=11068
>>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>>>> pid=11069
>>>>> 2009-10-17 18:59:39-0500 [-] Process ['ipengine',
>>>>> '--logfile=/home/gsever/.ipython/log/ipengine11066-'] has started with
>>>>> pid=11070
>>>>> 2009-10-17 18:59:39-0500 [-] Engines started with pids: [11067, 11068,
>>>>> 11069, 11070]
>>>>> 2009-10-17 18:59:39-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>> instead\n  import sha\n'
>>>>> 2009-10-17 18:59:39-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>> instead\n  import sha\n'
>>>>> 2009-10-17 18:59:39-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>> instead\n  import sha\n'
>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg/twisted/python/filepath.py:12:
>>>>> DeprecationWarning: the sha module is deprecated; use the hashlib module
>>>>> instead\n  import sha\n'
>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>> 2009-10-17 18:59:40-0500 [-]
>>>>> '/usr/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg/foolscap/banana.py:2:
>>>>> DeprecationWarning: the sets module is deprecated\n'
>>>>>
>>>>>
>>>>> Here is my system info:
>>>>>
>>>>> ================================================================================
>>>>> Platform     :
>>>>> Linux-2.6.29.6-217.2.3.fc11.i686.PAE-i686-with-fedora-11-Leonidas
>>>>> Python       : ('CPython', 'tags/r26', '66714')
>>>>> IPython      : 0.10
>>>>> NumPy      : 1.4.0.dev
>>>>>
>>>>> ================================================================================
>>>>>
>>>>> --
>>>>> Gökhan
>>>>>
>>>>> _______________________________________________
>>>>> IPython-user mailing list
>>>>> IPython-user@scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Gökhan
>>
>
>
>
> --
> Gökhan
>
> _______________________________________________
> IPython-user mailing list
> IPython-user@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>


More information about the IPython-user mailing list