[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

Neal Becker ndbecker2@gmail....
Sat Mar 22 20:47:02 CDT 2008


Thomas Grill wrote:

> Hi,
> here's my results:
> 
> Intel Core 2 Duo, 2.16GHz, 667MHz bus, 4MB Cache
> running under OSX 10.5.2
> 
> please note that the auto-vectorizer of gcc-4.3 is doing really well....
> 
> gr~~~
> 
> ---------------------
> 
> gcc version 4.0.1 (Apple Inc. build 5465)
> 
> xbook-2:temp thomas$ gcc -msse -O2 vec_bench.c -o vec_bench
> xbook-2:temp thomas$ ./vec_bench
> Testing methods...
> All OK
> 
>         Problem size              Simple              Intrin             
>         Inline
>                  100   0.0002ms (100.0%)   0.0001ms ( 83.2%)   0.0001ms (
>                  85.1%)
>                 1000   0.0014ms (100.0%)   0.0014ms ( 99.5%)   0.0014ms (
>                 97.6%)
>                10000   0.0180ms (100.0%)   0.0137ms ( 76.1%)   0.0103ms (
>                56.9%)
>               100000   0.1307ms (100.0%)   0.1153ms ( 88.2%)   0.0952ms (
>               72.8%)
>              1000000   4.0309ms (100.0%)   4.1641ms (103.3%)   4.0129ms (
>              99.6%)
>             10000000  43.2557ms (100.0%)  43.5919ms (100.8%)  42.6391ms (
>             98.6%)
> 
> 
> 
> gcc version 4.3.0 20080125 (experimental) (GCC)
> 
> xbook-2:temp thomas$ gcc-4.3 -msse -O2 vec_bench.c -o vec_bench
> xbook-2:temp thomas$ ./vec_bench
> Testing methods...
> All OK
> 
>         Problem size              Simple              Intrin             
>         Inline
>                  100   0.0002ms (100.0%)   0.0001ms ( 77.4%)   0.0001ms (
>                  72.0%)
>                 1000   0.0017ms (100.0%)   0.0014ms ( 84.4%)   0.0014ms (
>                 79.4%)
>                10000   0.0173ms (100.0%)   0.0148ms ( 85.4%)   0.0104ms (
>                59.9%)
>               100000   0.1276ms (100.0%)   0.1243ms ( 97.4%)   0.0952ms (
>               74.6%)
>              1000000   4.0466ms (100.0%)   4.1168ms (101.7%)   4.0348ms (
>              99.7%)
>             10000000  43.1842ms (100.0%)  43.2989ms (100.3%)  44.2171ms
>             (102.4%)
> 
> xbook-2:temp thomas$ gcc-4.3 -msse -O2 -ftree-vectorize vec_bench.c -o
> vec_bench xbook-2:temp thomas$ ./vec_bench
> Testing methods...
> All OK
> 
>         Problem size              Simple              Intrin             
>         Inline
>                  100   0.0001ms (100.0%)   0.0001ms (126.6%)   0.0001ms
>                  (120.3%)
>                 1000   0.0011ms (100.0%)   0.0014ms (136.3%)   0.0014ms
>                 (127.9%)
>                10000   0.0144ms (100.0%)   0.0153ms (106.3%)   0.0103ms (
>                72.0%)
>               100000   0.1027ms (100.0%)   0.1243ms (121.0%)   0.0953ms (
>               92.8%)
>              1000000   3.9691ms (100.0%)   4.1197ms (103.8%)   4.0252ms
>              (101.4%)
>             10000000  42.1922ms (100.0%)  43.6721ms (103.5%)  43.4035ms
>             (102.9%)
gcc version 4.3.0 20080307 (Red Hat 4.3.0-2) (GCC) 
gcc -msse -O2 -ftree-vectorize vec_bench.c -o vec_bench
mock-chroot> ./vec_bench
Testing methods...
All OK

        Problem size              Simple              Intrin              Inline
                 100   0.0001ms (100.0%)   0.0001ms (141.6%)   0.0001ms (108.0%)
                1000   0.0008ms (100.0%)   0.0011ms (149.9%)   0.0008ms (100.4%)
               10000   0.0135ms (100.0%)   0.0197ms (145.8%)   0.0133ms ( 98.8%)
              100000   0.6415ms (100.0%)   0.4918ms ( 76.7%)   0.5052ms ( 78.8%)
             1000000   7.5364ms (100.0%)   7.9987ms (106.1%)   7.4832ms ( 99.3%)
            10000000  76.3927ms (100.0%)  76.8933ms (100.7%)  75.1002ms ( 98.3%)
model name      : AMD Athlon(tm) 64 Processor 3200+
stepping        : 10
cpu MHz         : 2000.068
cache size      : 1024 KB

Now same, but with  gcc --version
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
Testing methods...
All OK

        Problem size              Simple              Intrin              Inline
                 100   0.0002ms (100.0%)   0.0001ms ( 77.2%)   0.0001ms ( 58.7%)
                1000   0.0015ms (100.0%)   0.0011ms ( 73.5%)   0.0008ms ( 52.6%)
               10000   0.0214ms (100.0%)   0.0195ms ( 90.9%)   0.0363ms (169.3%)
              100000   0.6620ms (100.0%)   0.5614ms ( 84.8%)   0.5527ms ( 83.5%)
             1000000   7.5975ms (100.0%)   7.3826ms ( 97.2%)   7.3380ms ( 96.6%)
            10000000  75.8361ms (100.0%)  84.0476ms (110.8%)  77.2884ms (101.9%)



More information about the Numpy-discussion mailing list