[Numpy-discussion] Fwd: Re: Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

Francesc Altet faltet@carabos....
Sun Mar 23 07:54:41 CDT 2008


Hi,

Here are my results for an AMD Opteron machine:

gcc version 4.1.3 (SUSE Linux) | Dual Core AMD Opteron 270 @ 2 GHz
$ gcc -msse -O2 vec_bench.c -o vec_bench 
$ ./vec_bench
Testing methods...
All OK

        Problem size              Simple              Intrin              
Inline
                 100   0.0005ms (100.0%)   0.0003ms ( 48.5%)   0.0002ms 
( 36.6%)
                1000   0.0030ms (100.0%)   0.0023ms ( 75.3%)   0.0015ms 
( 51.2%)
               10000   0.0423ms (100.0%)   0.0387ms ( 91.5%)   0.0271ms 
( 63.9%)
              100000   0.6138ms (100.0%)   0.5978ms ( 97.4%)   0.5834ms 
( 95.0%)
             1000000   5.1213ms (100.0%)   5.0689ms ( 99.0%)   4.8771ms 
( 95.2%)
            10000000  51.6820ms (100.0%)  51.0792ms ( 98.8%)  51.1346ms 
( 98.9%)

Using gcc version 4.2.1 (SUSE Linux) | Dual Core AMD Opteron 270 @ 2 GHz
$ gcc -msse -O2 vec_bench.c -o vec_bench 
$ ./vec_bench
Testing methods...
All OK

        Problem size              Simple              Intrin              
Inline
                 100   0.0005ms (100.0%)   0.0003ms ( 49.0%)   0.0002ms 
( 37.6%)
                1000   0.0030ms (100.0%)   0.0023ms ( 75.4%)   0.0016ms 
( 51.5%)
               10000   0.0422ms (100.0%)   0.0387ms ( 91.7%)   0.0273ms 
( 64.7%)
              100000   0.5833ms (100.0%)   0.5190ms ( 89.0%)   0.4756ms 
( 81.5%)
             1000000   5.2302ms (100.0%)   4.6074ms ( 88.1%)   4.4121ms 
( 84.4%)
            10000000  50.2559ms (100.0%)  48.5409ms ( 96.6%)  49.2436ms 
( 98.0%)


and for my laptop wearing a Pentium 4 Mobile @ 2 GHz:

Using version 4.1.3 (Ubuntu 4.1.2-16ubuntu2)
$ gcc -msse -O2 vec_bench.c -o vec_bench 
$ ./vec_bench
Testing methods...
All OK

        Problem size              Simple              Intrin              
Inline
                 100   0.0002ms (100.0%)   0.0002ms ( 88.8%)   0.0002ms 
(103.1%)
                1000   0.0020ms (100.0%)   0.0015ms ( 75.9%)   0.0021ms 
(103.5%)
               10000   0.0198ms (100.0%)   0.1507ms (761.8%)   0.0205ms 
(103.6%)
              100000   1.6296ms (100.0%)   1.2533ms ( 76.9%)   1.2586ms 
( 77.2%)
             1000000  13.9571ms (100.0%)  12.8786ms ( 92.3%)  13.6840ms 
( 98.0%)
            10000000  135.3217ms (100.0%)  128.5314ms ( 95.0%)  
128.5189ms ( 95.0%)

Using gcc version 4.2.1 (Ubuntu 4.2.1-5ubuntu4)
$ gcc -msse -O2 vec_bench.c -o vec_bench 
$ ./vec_bench
Testing methods...
All OK

        Problem size              Simple              Intrin              
Inline
                 100   0.0002ms (100.0%)   0.0002ms ( 90.6%)   0.0002ms 
(103.9%)
                1000   0.0022ms (100.0%)   0.0017ms ( 75.2%)   0.0020ms 
( 90.1%)
               10000   0.0181ms (100.0%)   0.2540ms (1403.8%)   0.0319ms 
(176.5%)
              100000   1.2600ms (100.0%)   1.2710ms (100.9%)   1.3510ms 
(107.2%)
             1000000  12.9181ms (100.0%)  12.8595ms ( 99.5%)  12.9160ms 
(100.0%)
            10000000  128.8301ms (100.0%)  128.2373ms ( 99.5%)  
128.4255ms ( 99.7%)

It is curious to see a venerable Pentium 4 running this code 2x faster 
than a powerful AMD Opteron for small datasets (<10000), and with 
similar speed than recent Core2 processors.  I suppose the first level 
cache in Pentiums is pretty fast.

Cheers,

-- 
Francesc Altet

-------------------------------------------------------

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"


More information about the Numpy-discussion mailing list