[IPython-user] Cannot start ipcluster

Gökhan Sever gokhansever@gmail....
Sun Oct 18 20:12:07 CDT 2009


On Sun, Oct 18, 2009 at 7:15 PM, Gökhan Sever <gokhansever@gmail.com> wrote:

>
>
> 2009/10/18 Brian Granger <ellisonbg.net@gmail.com>
>
>> Looks like you have been making progress...some comments:
>>
>> * Something quite odd is going on.  While it would be nice if you could
>> get 2.4-2.7 speedup on a dual core
>> system, I don't think that result is real.  I am not sure why you are
>> seeing this, but it is *extremely* rare
>> to see a speedup greater than the number or cores.  It is possible, but I
>> don't think you problem has
>> any of the characteristics that would make it so.
>>
>
You are right on your suspicion. I was making a clean run on each file. That
is deleting everything except the sea files in the folders. With this
configuration multiprocessing module's pooling approach doesn't work. It
cannot branch into the external script completely. However when I leave the
processed outputs in the folders and run the script it works and takes much
less than IPython's parallelism. Not the question is how to explain this
behaviour.

End of my 2.4 to 2.7X speed-up happiness :)



> * From your description of the problem, ipython should be giving you nearly
>> 2x speedup, but it is quite
>> lower.
>>
>> The combination of these things makes me think there is an aspect of all
>> of this we are not understanding yet.
>> I am suspecting that the method you are using to time your code is not
>> accurate.  I have seen this type of
>> thing before.  Can you time it using a more accurate approach?  Some thing
>> like:
>>
>> from timeit import default_timer as clock
>>
>> t1 = clock()
>> ....
>> t2 = clock()
>>
>> It is possible that IPython is slower than multiprocessing in this case,
>> but something else is going on here.
>>
>> Cheers,
>>
>>
> Here are new benchmark results (in seconds) using your suggested timing
> approach:
>
> 0-) Duration using the linear processing:  1048.07685399
>
> 1-) Duration using TaskClient and 2 Engines:  701.550107956
>
> 2-) Duration using MultiEngineClient and 2 Engines:  663.629260063
>
> 3-) I can't get timings using this method when I use multiprocessing
> module.
>
> I will send my 4 scripts to your email for further investigations. So far,
> the results don't seem much different than what were they in original.
>
>
>
>
>> Brian
>>
>>
>> On Sun, Oct 18, 2009 at 2:01 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>>
>>>
>>>
>>> On Sun, Oct 18, 2009 at 2:34 PM, Gökhan Sever <gokhansever@gmail.com>wrote:
>>>
>>>>
>>>> Moreeeeee speed-up :)
>>>>
>>>> Next step is to use multiprocessing module.
>>>>
>>>
>>> I did two tests since I was not sure which timing to believe:
>>>
>>> real    6m37.591s
>>> user    10m16.450s
>>> sys    0m4.808s
>>>
>>> real    7m22.209s
>>> user    11m21.296s
>>> sys    0m5.540s
>>>
>>> which in result I figured out real is what I want to see.  So the
>>> improvement with respect to original linear 18m 5s run is 2.4 to 2.7X
>>> speed-up in a Dual Core 2.5 Ghz laptop using Python's multiprocessing
>>> module, which is great only adding a few line of code and slightly modifying
>>> my original process_all wrapper script.
>>>
>>> Here is the code:
>>>
>>>
>>> #!/usr/bin/env python
>>>
>>> """
>>> Execute postprocessing_saudi script in parallel using multiprocessing
>>> module.
>>> """
>>>
>>> from multiprocessing import Pool
>>> from subprocess import call
>>> import os
>>>
>>>
>>> def find_sea_files():
>>>
>>>     file_list, path_list = [], []
>>>     init = os.getcwd()
>>>
>>>     for root, dirs, files in os.walk('.'):
>>>         dirs.sort()
>>>         for file in files:
>>>             if file.endswith('.sea'):
>>>                 file_list.append(file)
>>>                 os.chdir(root)
>>>                 path_list.append(os.getcwd())
>>>                 os.chdir(init)
>>>
>>>     return file_list, path_list
>>>
>>>
>>> def process_all(pf):
>>>     os.chdir(pf[0])
>>>     call(['postprocessing_saudi', pf[1]])
>>>
>>>
>>> if __name__ == '__main__':
>>>     pool = Pool(processes=2)              # start 2 worker processes
>>>     files, paths = find_sea_files()
>>>     pathfile = [[paths[i],files[i]] for i in range(len(files))]
>>>     pool.map(process_all, pathfile)
>>>
>>>
>>> The main difference is to change map call since Python's original map
>>> supports only one iterable argument. This approach also shows execution
>>> results on the terminal screen unlike IPython's. I am assuming like
>>> IPython's, multiprocessing module should be able to run on external nodes.
>>> Which means once I can set a few fast external machines I can perform a few
>>> more tests.
>>>
>>> --
>>> Gökhan
>>>
>>
>>
>
>
> --
> Gökhan
>



-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20091018/9f627465/attachment.html 


More information about the IPython-user mailing list