[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
Charles R Harris
charlesr.harris@gmail....
Sat Mar 22 20:48:47 CDT 2008
On Sat, Mar 22, 2008 at 7:35 PM, Scott Ransom <sransom@nrao.edu> wrote:
> Here are results under 64-bit linux using gcc-4.3 (which by
> default turns on the various sse flags). Note that -O3 is
> significantly better than -O2 for the "simple" calls:
>
> nimrod:~$ cat /proc/cpuinfo | grep "model name" | head -1
> model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz
>
> nimrod:~$ gcc-4.3 --version
> gcc-4.3 (Debian 4.3.0-1) 4.3.1 20080309 (prerelease)
>
> nimrod:~$ gcc-4.3 -O2 vec_bench.c -o vec_bench
> nimrod:~$ ./vec_bench
> Testing methods...
> All OK
> Problem size Simple Intrin Inline
> 100 0.0001ms (100.0%) 0.0001ms ( 70.8%) 0.0001ms ( 74.3%)
> 1000 0.0008ms (100.0%) 0.0006ms ( 70.3%) 0.0007ms ( 80.3%)
> 10000 0.0085ms (100.0%) 0.0061ms ( 72.0%) 0.0067ms ( 78.8%)
> 100000 0.0882ms (100.0%) 0.0627ms ( 71.1%) 0.0677ms ( 76.7%)
> 1000000 3.6748ms (100.0%) 3.3312ms ( 90.7%) 3.3139ms ( 90.2%)
> 10000000 37.1154ms (100.0%) 35.9762ms ( 96.9%) 36.1126ms ( 97.3%)
>
> nimrod:~$ gcc-4.3 -O3 vec_bench.c -o vec_bench
> nimrod:~$ ./vec_bench
> Testing methods...
> All OK
> Problem size Simple Intrin Inline
> 100 0.0001ms (100.0%) 0.0001ms (111.1%) 0.0001ms (116.7%)
> 1000 0.0005ms (100.0%) 0.0006ms (111.3%) 0.0007ms (126.8%)
> 10000 0.0056ms (100.0%) 0.0061ms (108.6%) 0.0067ms (118.9%)
> 100000 0.0581ms (100.0%) 0.0626ms (107.8%) 0.0677ms (116.5%)
> 1000000 3.4549ms (100.0%) 3.3339ms ( 96.5%) 3.3255ms ( 96.3%)
> 10000000 34.8186ms (100.0%) 35.9767ms (103.3%) 36.1099ms (103.7%)
>
>
> nimrod:~$ ./vec_bench_dbl
> Testing methods...
> All OK
> Problem size Simple Intrin
> 100 0.0001ms (100.0%) 0.0001ms (132.5%)
> 1000 0.0009ms (100.0%) 0.0012ms (134.5%)
> 10000 0.0119ms (100.0%) 0.0124ms (104.1%)
> 100000 0.1226ms (100.0%) 0.1276ms (104.1%)
> 1000000 7.0047ms (100.0%) 6.6654ms ( 95.2%)
> 10000000 70.0060ms (100.0%) 71.9692ms (102.8%)
>
> nimrod:~$ gcc-4.3 -O3 vec_bench_dbl.c -o vec_bench_dbl
> nimrod:~$ ./vec_bench_dbl
> Testing methods...
> All OK
> Problem size Simple Intrin
> 100 0.0001ms (100.0%) 0.0002ms (289.8%)
> 1000 0.0007ms (100.0%) 0.0012ms (172.7%)
> 10000 0.0114ms (100.0%) 0.0124ms (109.4%)
> 100000 0.1159ms (100.0%) 0.1278ms (110.3%)
> 1000000 6.9252ms (100.0%) 6.6585ms ( 96.1%)
> 10000000 69.1913ms (100.0%) 71.9664ms (104.0%)
It looks to me like the best approach here is to generate operator specific
loops for arithmetic, then check the step size in the loop for contiguous
data, and if found branch to a block where the pointers have been cast to
the right type. The loop itself could even check for operator type by
switching on the function address so that the code modifications could be
localized. The compiler can do the rest.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080322/d9531d3e/attachment.html
More information about the Numpy-discussion
mailing list