[SciPy-dev] fast gaussian filtering

Zachary Pincus zachary.pincus@yale....
Fri Dec 19 11:31:41 CST 2008


> I've noticed that SciPy implements Gaussian filtering (in the ndimage
> package) using convolution. It is implemented by loops in Python, not
> even using a vectorized numpy.convolve or an FFT. For the sake of  
> speed,
> SciPy is using the worst thinkable implementation.

What makes you think this?

A brief look at the implementation in scipy/ndimage/filters.py shows  
that scipy.ndimage.gaussian_filter, after doing some set-up, defers to  
gaussian_filter1d to do the filtering on each dimension (as Gaussian  
filtering is separable). Then gaussian_filter1d uses for-loops to do  
some further set-up, like calculating the filter kernel, and then  
defers to correlate1d for the actual filtering. This function does  
some error-checking, and then calls _nd_image.correlate1d. The  
_nd_image library is a C extension module: _nd_image.correlate1d  
corresponds to Py_Correlate1D, which is defined in scipy/ndimage/src/ 
nd_image.c, and which calls to NI_Correlate1D to do the lifting.  
NI_Correlate1D, in scipy/ndimage/src/ni_filters.c, implements the  
filtering in a straightforward loop, but also takes advantage where it  
can of filter-kernel symmetry or anti-symmetry.

Aside from the several thin layers of error-checking and set-up, the  
code does the filtering in C in about the fastest way possible that  
doesn't use recursion or FFT. (And even then, I don't know under what  
circumstances the direct-convolution method might still win --  
probably with small kernels.)

> To make it fast:
> Gaussian filtering can be implemented using recursion (Young & Vliet,
> 1995; Signal Processing, 44: 139-151). The recursive approximation is
> very accurate and does not introduce ringing. It is anti-causal
> (forward-backward) and has zero phase response. The filtering can be
> done in C using scipy's signal.lfilter function.

Your implementation looks pretty straightforward. Here are some  
timings (sm_gaussian is your Gaussian2D, nd_gaussian is  

In: img = numpy.random.rand(100, 100)

In: timeit sm_gaussian(img, 2)
1000 loops, best of 3: 1.54 ms per loop

In: timeit nd_gaussian(img, 2)
1000 loops, best of 3: 1.26 ms per loop

In: timeit sm_gaussian(img, 20)
1000 loops, best of 3: 1.55 ms per loop

In: timeit nd_gaussian(img, 20)
100 loops, best of 3: 7.41 ms per loop

In: img = numpy.random.rand(1000, 1000)

In: timeit sm_gaussian(img, 2)
10 loops, best of 3: 168 ms per loop

In: timeit nd_gaussian(img, 2)
10 loops, best of 3: 92.8 ms per loop

In: timeit sm_gaussian(img, 20)
10 loops, best of 3: 167 ms per loop

In: timeit nd_gaussian(img, 20)
10 loops, best of 3: 657 ms per loop

As expected, the recursive approach is a definite win with larger  
kernels, but with small kernels the constant factors dominate and the  
straightforward implementation is narrowly faster. At the very least,  
you should post this implementation on the cookbook page...


More information about the Scipy-dev mailing list