[Numpy-discussion] Comparing NumPy/IDL Performance
Mon Sep 26 10:24:01 CDT 2011
While I also echo Johann's points about the arbitrariness and non-utility of benchmarking I'll briefly comment just on just a few tests to help out with getting things into idiomatic python/numpy:
Tests 1 and 2 are fairly pointless (empty for loop and empty procedure) that won't actually influence the running time of well-written non-pathological code.
#Test 3 - Add 200000 scalar ints
nrep = 2000000 * scale_factor
for i in range(nrep):
a = i + 1
well, python looping is slow... one doesn't do such loops in idiomatic code if the underlying intent can be re-cast into array operations in numpy. But here the test is on such a simple operation that it's not clear how to recast in a way that would remain reasonable. Ideally you'd test something like:
i = numpy.arange(200000)
for j in range(scale_factor):
a = i + 1
but that sort of changes what the test is testing.
Finally, test 21:
#Test 21 - Smooth 512 by 512 byte array, 5x5 boxcar
for i in range(nrep):
b = scipy.ndimage.filters.median_filter(a, size=(5, 5))
timer.log('Smooth 512 by 512 byte array, 5x5 boxcar, %d times' % nrep)
A median filter is definitely NOT a boxcar filter! You want "uniform_filter":
In : a = numpy.empty((1000,1000))
In : timeit scipy.ndimage.filters.median_filter(a, size=(5, 5))
10 loops, best of 3: 93.2 ms per loop
In : timeit scipy.ndimage.filters.uniform_filter(a, size=(5, 5))
10 loops, best of 3: 27.7 ms per loop
On Sep 26, 2011, at 10:19 AM, Keith Hughitt wrote:
> Hi all,
> Myself and several colleagues have recently started work on a Python library for solar physics, in order to provide an alternative to the current mainstay for solar physics, which is written in IDL.
> One of the first steps we have taken is to create a Python port of a popular benchmark for IDL (time_test3) which measures performance for a variety of (primarily matrix) operations. In our initial attempt, however, Python performs significantly poorer than IDL for several of the tests. I have attached a graph which shows the results for one machine: the x-axis is the test # being compared, and the y-axis is the time it took to complete the test, in milliseconds. While it is possible that this is simply due to limitations in Python/Numpy, I suspect that this is due at least in part to our lack in familiarity with NumPy and SciPy.
> So my question is, does anyone see any places where we are doing things very inefficiently in Python?
> In order to try and ensure a fair comparison between IDL and Python there are some things (e.g. the style of timing and output) which we have deliberately chosen to do a certain way. In other cases, however, it is likely that we just didn't know a better method.
> Any feedback or suggestions people have would be greatly appreciated. Unfortunately, due to the proprietary nature of IDL, we cannot share the original version of time_test3, but hopefully the comments in time_test3.py will be clear enough.
> NumPy-Discussion mailing list
More information about the NumPy-Discussion