[IPython-user] Cannot start ipcluster

Gökhan Sever gokhansever@gmail....
Sun Oct 18 21:50:30 CDT 2009


On Sun, Oct 18, 2009 at 8:12 PM, Gökhan Sever <gokhansever@gmail.com> wrote:

>
>
> On Sun, Oct 18, 2009 at 7:15 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>
>>
>>
>> 2009/10/18 Brian Granger <ellisonbg.net@gmail.com>
>>
>>> Looks like you have been making progress...some comments:
>>>
>>> * Something quite odd is going on.  While it would be nice if you could
>>> get 2.4-2.7 speedup on a dual core
>>> system, I don't think that result is real.  I am not sure why you are
>>> seeing this, but it is *extremely* rare
>>> to see a speedup greater than the number or cores.  It is possible, but I
>>> don't think you problem has
>>> any of the characteristics that would make it so.
>>>
>>
> You are right on your suspicion. I was making a clean run on each file.
> That is deleting everything except the sea files in the folders. With this
> configuration multiprocessing module's pooling approach doesn't work. It
> cannot branch into the external script completely. However when I leave the
> processed outputs in the folders and run the script it works and takes much
> less than IPython's parallelism. Not the question is how to explain this
> behaviour.
>
> End of my 2.4 to 2.7X speed-up happiness :)
>

Posted a question on stackoverflow to get some external responses.

http://stackoverflow.com/questions/1586754/using-multiprocessing-pool-of-workers



>
>
>
>> * From your description of the problem, ipython should be giving you
>>> nearly 2x speedup, but it is quite
>>> lower.
>>>
>>> The combination of these things makes me think there is an aspect of all
>>> of this we are not understanding yet.
>>> I am suspecting that the method you are using to time your code is not
>>> accurate.  I have seen this type of
>>> thing before.  Can you time it using a more accurate approach?  Some
>>> thing like:
>>>
>>> from timeit import default_timer as clock
>>>
>>> t1 = clock()
>>> ....
>>> t2 = clock()
>>>
>>> It is possible that IPython is slower than multiprocessing in this case,
>>> but something else is going on here.
>>>
>>> Cheers,
>>>
>>>
>> Here are new benchmark results (in seconds) using your suggested timing
>> approach:
>>
>> 0-) Duration using the linear processing:  1048.07685399
>>
>> 1-) Duration using TaskClient and 2 Engines:  701.550107956
>>
>> 2-) Duration using MultiEngineClient and 2 Engines:  663.629260063
>>
>> 3-) I can't get timings using this method when I use multiprocessing
>> module.
>>
>> I will send my 4 scripts to your email for further investigations. So far,
>> the results don't seem much different than what were they in original.
>>
>>
>>
>>
>>> Brian
>>>
>>>
>>> On Sun, Oct 18, 2009 at 2:01 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>>>
>>>>
>>>>
>>>> On Sun, Oct 18, 2009 at 2:34 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>>>>
>>>>>
>>>>> Moreeeeee speed-up :)
>>>>>
>>>>> Next step is to use multiprocessing module.
>>>>>
>>>>
>>>> I did two tests since I was not sure which timing to believe:
>>>>
>>>> real    6m37.591s
>>>> user    10m16.450s
>>>> sys    0m4.808s
>>>>
>>>> real    7m22.209s
>>>> user    11m21.296s
>>>> sys    0m5.540s
>>>>
>>>> which in result I figured out real is what I want to see.  So the
>>>> improvement with respect to original linear 18m 5s run is 2.4 to 2.7X
>>>> speed-up in a Dual Core 2.5 Ghz laptop using Python's multiprocessing
>>>> module, which is great only adding a few line of code and slightly modifying
>>>> my original process_all wrapper script.
>>>>
>>>> Here is the code:
>>>>
>>>>
>>>> #!/usr/bin/env python
>>>>
>>>> """
>>>> Execute postprocessing_saudi script in parallel using multiprocessing
>>>> module.
>>>> """
>>>>
>>>> from multiprocessing import Pool
>>>> from subprocess import call
>>>> import os
>>>>
>>>>
>>>> def find_sea_files():
>>>>
>>>>     file_list, path_list = [], []
>>>>     init = os.getcwd()
>>>>
>>>>     for root, dirs, files in os.walk('.'):
>>>>         dirs.sort()
>>>>         for file in files:
>>>>             if file.endswith('.sea'):
>>>>                 file_list.append(file)
>>>>                 os.chdir(root)
>>>>                 path_list.append(os.getcwd())
>>>>                 os.chdir(init)
>>>>
>>>>     return file_list, path_list
>>>>
>>>>
>>>> def process_all(pf):
>>>>     os.chdir(pf[0])
>>>>     call(['postprocessing_saudi', pf[1]])
>>>>
>>>>
>>>> if __name__ == '__main__':
>>>>     pool = Pool(processes=2)              # start 2 worker processes
>>>>     files, paths = find_sea_files()
>>>>     pathfile = [[paths[i],files[i]] for i in range(len(files))]
>>>>     pool.map(process_all, pathfile)
>>>>
>>>>
>>>> The main difference is to change map call since Python's original map
>>>> supports only one iterable argument. This approach also shows execution
>>>> results on the terminal screen unlike IPython's. I am assuming like
>>>> IPython's, multiprocessing module should be able to run on external nodes.
>>>> Which means once I can set a few fast external machines I can perform a few
>>>> more tests.
>>>>
>>>> --
>>>> Gökhan
>>>>
>>>
>>>
>>
>>
>> --
>> Gökhan
>>
>
>
>
> --
> Gökhan
>



-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20091018/a0b66ae5/attachment-0001.html 


More information about the IPython-user mailing list