Looks like you have been making progress...some comments:<br><br>* Something quite odd is going on. While it would be nice if you could get 2.4-2.7 speedup on a dual core<br>system, I don't think that result is real. I am not sure why you are seeing this, but it is *extremely* rare<br>
to see a speedup greater than the number or cores. It is possible, but I don't think you problem has<br>any of the characteristics that would make it so.<br>* From your description of the problem, ipython should be giving you nearly 2x speedup, but it is quite<br>
lower.<br><br>The combination of these things makes me think there is an aspect of all of this we are not understanding yet.<br>I am suspecting that the method you are using to time your code is not accurate. I have seen this type of <br>
thing before. Can you time it using a more accurate approach? Some thing like:<br><br>from timeit import default_timer as clock<br><br>t1 = clock()<br>....<br>t2 = clock()<br><br>It is possible that IPython is slower than multiprocessing in this case, but something else is going on here.<br>
<br>Cheers,<br><br>Brian<br><br><div class="gmail_quote">On Sun, Oct 18, 2009 at 2:01 PM, Gökhan Sever <span dir="ltr"><<a href="mailto:gokhansever@gmail.com">gokhansever@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br><br><div class="gmail_quote">On Sun, Oct 18, 2009 at 2:34 PM, Gökhan Sever <span dir="ltr"><<a href="mailto:gokhansever@gmail.com" target="_blank">gokhansever@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br><div class="gmail_quote"><div>Moreeeeee speed-up :)<div class="im"><br><br>Next step is to use multiprocessing module.<br clear="all"></div></div></div></blockquote><div><br>I did two tests since I was not sure which timing to believe:<br>
<br>real 6m37.591s<br>user 10m16.450s<br>sys 0m4.808s<br><br>real 7m22.209s<br>user 11m21.296s<br>sys 0m5.540s<br><br>which in result I figured out real is what I want to see. So the improvement with respect to original linear 18m 5s run is 2.4 to 2.7X speed-up in a Dual Core 2.5 Ghz laptop using Python's multiprocessing module, which is great only adding a few line of code and slightly modifying my original process_all wrapper script. <br>
<br>Here is the code:<br><br><br>#!/usr/bin/env python<br><br>"""<br>Execute postprocessing_saudi script in parallel using multiprocessing module.<br>"""<br><br>from multiprocessing import Pool<div class="im">
<br>
from subprocess import call<br>import os<br><br><br>def find_sea_files():<br><br> file_list, path_list = [], []<br> init = os.getcwd()<br><br> for root, dirs, files in os.walk('.'):<br> dirs.sort()<br>
for file in files:<br> if file.endswith('.sea'):<br> file_list.append(file)<br> os.chdir(root)<br> path_list.append(os.getcwd())<br> os.chdir(init)<br>
<br> return file_list, path_list<br><br><br></div>def process_all(pf):<br> os.chdir(pf[0])<br> call(['postprocessing_saudi', pf[1]])<br><br><br>if __name__ == '__main__':<br> pool = Pool(processes=2) # start 2 worker processes<div class="im">
<br>
files, paths = find_sea_files()<br></div> pathfile = [[paths[i],files[i]] for i in range(len(files))]<br> pool.map(process_all, pathfile)<br><br><br>The main difference is to change map call since Python's original map supports only one iterable argument. This approach also shows execution results on the terminal screen unlike IPython's. I am assuming like IPython's, multiprocessing module should be able to run on external nodes. Which means once I can set a few fast external machines I can perform a few more tests. <br>
</div></div><br>-- <br><font color="#888888">Gökhan<br>
</font></blockquote></div><br>