[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

Emanuele Olivetti emanuele@relativita....
Sun Mar 23 03:20:28 CDT 2008


James Philbin wrote:
> OK, i've written a simple benchmark which implements an elementwise
> multiply (A=B*C) in three different ways (standard C, intrinsics, hand
> coded assembly). On the face of things the results seem to indicate
> that the vectorization works best on medium sized inputs. If people
> could post the results of running the benchmark on their machines
> (takes ~1min) along with the output of gcc --version and their chip
> model, that wd be v useful.
>
> It should be compiled with: gcc -msse -O2 vec_bench.c -o vec_bench
>

CPU: Intel(R) Core(TM)2 CPU  T7400  @ 2.16GHz
(macbook, intel core 2 duo)

gcc (GCC) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)
(ubuntu gutsy gibbon 7.10)

$ ./vec_bench
Testing methods...
All OK

        Problem size              Simple             
Intrin              Inline
                 100   0.0003ms (100.0%)   0.0002ms ( 68.3%)   0.0002ms
( 75.6%)
                1000   0.0023ms (100.0%)   0.0018ms ( 76.7%)   0.0020ms
( 87.1%)
               10000   0.0361ms (100.0%)   0.0193ms ( 53.4%)   0.0338ms
( 93.7%)
              100000   0.2839ms (100.0%)   0.1351ms ( 47.6%)   0.0937ms
( 33.0%)
             1000000   4.2108ms (100.0%)   4.1234ms ( 97.9%)   4.0886ms
( 97.1%)
            10000000  45.3192ms (100.0%)  45.5359ms (100.5%)  45.3466ms
(100.1%)


Note that there is some variance in the results. Here is a second run to
have
an idea (look at Inline, size=10000):

$ ./vec_bench
Testing methods...
All OK

        Problem size              Simple             
Intrin              Inline
                 100   0.0003ms (100.0%)   0.0002ms ( 69.5%)   0.0002ms
( 74.1%)
                1000   0.0024ms (100.0%)   0.0018ms ( 75.9%)   0.0020ms
( 86.4%)
               10000   0.0324ms (100.0%)   0.0186ms ( 57.3%)   0.0226ms
( 69.6%)
              100000   0.2840ms (100.0%)   0.1171ms ( 41.2%)   0.0939ms
( 33.1%)
             1000000   4.4034ms (100.0%)   4.3657ms ( 99.1%)   4.0465ms
( 91.9%)
            10000000  44.4854ms (100.0%)  43.9502ms ( 98.8%)  43.6824ms
( 98.2%)


HTH

Emanuele



More information about the Numpy-discussion mailing list