<br><br><div class="gmail_quote">On Sun, Oct 18, 2009 at 7:15 PM, Gökhan Sever <span dir="ltr"><<a href="mailto:gokhansever@gmail.com">gokhansever@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br><br><div class="gmail_quote">2009/10/18 Brian Granger <span dir="ltr"><<a href="http://ellisonbg.net" target="_blank">ellisonbg.net</a>@<a href="http://gmail.com" target="_blank">gmail.com</a>></span><div class="im">
<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Looks like you have been making progress...some comments:<br><br>* Something quite odd is going on. While it would be nice if you could get 2.4-2.7 speedup on a dual core<br>system, I don't think that result is real. I am not sure why you are seeing this, but it is *extremely* rare<br>
to see a speedup greater than the number or cores. It is possible, but I don't think you problem has<br>any of the characteristics that would make it so.<br></blockquote></div></div></blockquote><div><br>You are right on your suspicion. I was making a clean run on each file. That is deleting everything except the sea files in the folders. With this configuration multiprocessing module's pooling approach doesn't work. It cannot branch into the external script completely. However when I leave the processed outputs in the folders and run the script it works and takes much less than IPython's parallelism. Not the question is how to explain this behaviour.<br>
<br>End of my 2.4 to 2.7X speed-up happiness :)<br><br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="gmail_quote"><div class="im">
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">* From your description of the problem, ipython should be giving you nearly 2x speedup, but it is quite<br>
lower.<br><br>The combination of these things makes me think there is an aspect of all of this we are not understanding yet.<br>I am suspecting that the method you are using to time your code is not accurate. I have seen this type of <br>
thing before. Can you time it using a more accurate approach? Some thing like:<br><br>from timeit import default_timer as clock<br><br>t1 = clock()<br>....<br>t2 = clock()<br><br>It is possible that IPython is slower than multiprocessing in this case, but something else is going on here.<br>
<br>Cheers,<br><font color="#888888"><br></font></blockquote></div><div><br>Here are new benchmark results (in seconds) using your suggested timing approach:<br><br>0-) Duration using the linear processing: 1048.07685399<br>
<br>
1-) Duration using TaskClient and 2 Engines: 701.550107956<br>
<br>2-) Duration using MultiEngineClient and 2 Engines: 663.629260063<br><br>3-) I can't get timings using this method when I use multiprocessing module. <br><br>I will send my 4 scripts to your email for further investigations. So far, the results don't seem much different than what were they in original. <br>
<br><br> </div><div><div></div><div class="h5"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<font color="#888888">Brian</font><div><div></div><div><br><br><div class="gmail_quote">On Sun, Oct 18, 2009 at 2:01 PM, Gökhan Sever <span dir="ltr"><<a href="mailto:gokhansever@gmail.com" target="_blank">gokhansever@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br><br><div class="gmail_quote">On Sun, Oct 18, 2009 at 2:34 PM, Gökhan Sever <span dir="ltr"><<a href="mailto:gokhansever@gmail.com" target="_blank">gokhansever@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br><div class="gmail_quote"><div>Moreeeeee speed-up :)<div><br><br>Next step is to use multiprocessing module.<br clear="all"></div></div></div></blockquote><div><br>I did two tests since I was not sure which timing to believe:<br>
<br>real 6m37.591s<br>user 10m16.450s<br>sys 0m4.808s<br><br>real 7m22.209s<br>user 11m21.296s<br>sys 0m5.540s<br><br>which in result I figured out real is what I want to see. So the improvement with respect to original linear 18m 5s run is 2.4 to 2.7X speed-up in a Dual Core 2.5 Ghz laptop using Python's multiprocessing module, which is great only adding a few line of code and slightly modifying my original process_all wrapper script. <br>
<br>Here is the code:<br><br><br>#!/usr/bin/env python<br><br>"""<br>Execute postprocessing_saudi script in parallel using multiprocessing module.<br>"""<br><br>from multiprocessing import Pool<div>
<br>
from subprocess import call<br>import os<br><br><br>def find_sea_files():<br><br> file_list, path_list = [], []<br> init = os.getcwd()<br><br> for root, dirs, files in os.walk('.'):<br> dirs.sort()<br>
for file in files:<br> if file.endswith('.sea'):<br> file_list.append(file)<br> os.chdir(root)<br> path_list.append(os.getcwd())<br> os.chdir(init)<br>
<br> return file_list, path_list<br><br><br></div>def process_all(pf):<br> os.chdir(pf[0])<br> call(['postprocessing_saudi', pf[1]])<br><br><br>if __name__ == '__main__':<br> pool = Pool(processes=2) # start 2 worker processes<div>
<br>
files, paths = find_sea_files()<br></div> pathfile = [[paths[i],files[i]] for i in range(len(files))]<br> pool.map(process_all, pathfile)<br><br><br>The main difference is to change map call since Python's original map supports only one iterable argument. This approach also shows execution results on the terminal screen unlike IPython's. I am assuming like IPython's, multiprocessing module should be able to run on external nodes. Which means once I can set a few fast external machines I can perform a few more tests. <br>
</div></div><br>-- <br><font color="#888888">Gökhan<br>
</font></blockquote></div><br>
</div></div></blockquote></div></div></div><br><br clear="all"><br>-- <br><font color="#888888">Gökhan<br>
</font></blockquote></div><br><br clear="all"><br>-- <br>Gökhan<br>