[SciPy-user] Benchmark data
oliphant.travis at ieee.org
Fri Dec 9 14:01:41 CST 2005
Gerard Vermeulen wrote:
>On Fri, 09 Dec 2005 03:14:49 -0700
>Travis Oliphant <oliphant.travis at ieee.org> wrote:
>>I'd like people to try out scipy core in SVN. I made improvements to the
>>buffered ufunc section of code that I think will make a big difference
>>in the recently published benchmarks.
>indeed, it made a big difference (for big arrays scipy is now fastest on some
>Below are my benchmark results on my DIY python, see
>On my system and for large arrays (>4096), numarray is still fastest, scipy moved
>to second and Numeric is third.
>Numeric is still fastest for small arrays, scipy is second, numarray is third.
Numeric will always be faster for small-enough arrays, I think, because
it doesn't have the ufunc overhead. I just don't want it to be a lot
faster. We can improve the limiting scalar case in scipy_core using
separate scalar math. It looks like we are doing reasonably well.
>Invoking: python bench.py 12
>Importing test to scipy
>Importing base to scipy
>Importing basic to scipy
>Python 2.4.2 (#1, Dec 4 2005, 08:21:04)
>[GCC 3.4.3 (Mandrakelinux 10.2 3.4.3-7mdk)]
>Optimization flags: -DNDEBUG -O3 -march=i686
>CPU info: getNCPUs=2 has_mmx has_sse has_sse2 is_32bit is_Intel is_Pentium is_PentiumIV
>benchmark size = 12 (vectors of length 16777216)
>label Numeric numarray scipy.base
> 1 0.4127 0.07423 0.3927
> 2 0.2734 0.2321 0.3234
> 3 0.1975 0.1821 0.2733
> 4 0.8747 0.5371 0.5588
> 5 0.2896 0.2342 0.2737
> 6 0.2066 0.1731 0.2718
> 7 0.8761 0.6286 0.5524
> 8 0.6546 0.4556 0.4533
> 9 9.488 7.566 8.717
> 10 9.506 8.064 8.745
> 11 7.879 6.301 7.305
>TOTAL 30.66 24.45 27.87
As mentioned before, it looks like the optimizer is doing something nice
on your system. One issue is arange which could definitely be made
faster by having different "fillers" for different types. I'm still
astonished by the markedly different numbers you seem to get than others
have shown. Is this all -O3 optimization kicking in?
The other issue is the sin and cosine functions. They don't have their
own inner loops. They call a generic inner loop with a
"function-pointer" data. Perhaps the optimizer can't do as much with
that or it needs to written with an optimizer in mind.
Ultimately, though, I'd like to see some of the inner loops to take
advantage of SSE (and equivalent) instructions if the number of
iterations is large-enough. So, yes, I think we could get faster.
But, I'd first like to get more data from more machines and compiler
flags to determine where the slowness is really coming from. It might
be good, for example, to break up one of lines 9, 10, and 11 so that at
least one sin and cos calculation is done alone.
More information about the SciPy-user