[SciPy-User] Rotated, Anisotropic Gaussian Filtering (Kernel Density Estimation)

Aronne Merrelli aronne.merrelli@gmail....
Fri Oct 19 10:25:45 CDT 2012

On Thu, Oct 18, 2012 at 11:20 AM, Patrick Marsh <patrick.marsh@noaa.gov> wrote:
>> Patrick,
>> I pulled down the gist code to run the cython annotate on it, and
>> found that there were some declarations missing. I added these at the
>> top of the file:
>> ctypedef np.float64_t DTYPE64_t
>> DTYPE64 = np.float
>> from libc.math cimport exp, sin, cos
>> And it compiles and looks OK to me; there isn't anything obvious that
>> would make it slow. However, depending on how you defined exp, sin,
>> cos, in the file you are actually running, if you are linking those
>> back to the numpy versions instead of the C versions, this code would
>> be pretty slow.
> Hi, Aronne,
> Thanks for the great response. I really appreciate You did catch a couple of
> declarations I missed when posting the gist. (I created the gist from a
> module I have, and forgot to copy all of the header stuff.) I've fixed that
> now, but essentially I declare exp, cos, sin, and fabs as:
> cdef extern from 'math.h':
>     float exp(float x)
>     float cos(float x)
>     float sin(float x)
>     float fabs(float x)
> Is the way you define the math functions better/faster? My (limited)
> thinking is that your method and my method achieve the same thing, but my
> understanding of Cython is simply from hacking around with it and could
> easily be in err.

I'm not a Cython expert, but as I understand it both definitions
should produce functionally equivalent C code. So it shouldn't affect
the speed at all.

>> Otherwise, just after skimming the cython code, it looks like the
>> gaussian kernel (partweight in the cython code) is fixed, so this is
>> really just a convolution. If you compute partweight once, in python,
>> then you can just use the convolution function in scipy. This should
>> be as fast as any simple cython code, I'd think, and it is a lot
>> simpler.
> When I first wrote this code (a couple of years ago) I didn't know about
> convolutions. However, as I'm learning about them, I see that what I'm doing
> really is a convolution. My attempts to use the convolution function in
> scipy is slower than my Cython code. Maybe I'm doing something wrong? The
> line below is how I'm calling the convolution:
> smooth_hist = ndimage.filters.convolve(data, weights, mode='constant',
> cval=0.0, origin=0)

Hmm, I think what is happening here is that your custom cython code is
making an optimization that you do not loop over the weights when data
has a zero array element. This relates to my other question about how
many values are nonzero in the data array. If it is very sparse, then
your cython code will probably be as good as you can do with
non-parallel code (I don't have any experience with parallelizing
things so I can't help there).

>> If you try that, is it enough? If not, can you be more specific as to
>> what cases you have where the performance is bad? Specifically: what
>> size is the data array? what size is the kernel? what number of points
>> are non zero in the data array?
> Currently I'm using this on a grid that's approxiately 800x600 with a kernel
> of about half that (Gaussian function with sigma of ~40km). This grid is
> essentially the eastern half of the United States at a grid spacing of 4km.
> On this grid, the Cython code is plenty fast. However, as I move toward
> dealing with finer meshes, my grid goes from 800x600 to closer to 5000x3000.
> Again, when dealing with binary, discrete data, the Cython routine is fairly
> quick. However, as the data become closer to continuous fields (say a
> temperature field instead of tornado tracks), the Cython code's performance
> decreases fairly quickly. When compared to using Gaussian filters (which I
> understand to be convolutions), the Cython code is substantially slower. The
> Cython code is still significantly slower than my workflow of "rotate, use
> gaussian filter, rotate back". The problem with the rotate workflow is that
> I'm uncomfortable with losing data in the rotations.
> (To see one application of what I'm doing, here's a paper in Weather and
> Forecasting describing what I'm doing:
> http://www.patricktmarsh.com/research/pubs/refereed/marshetal2012_precip.pdf)
> I'm currently proceeding with the Cython code as it's sufficient for what
> I'm doing right now. However, I was thinking of down the road. I wasn't sure
> where Scipy's Gaussian filters were getting such a bigger speed up than my
> Cython code.

Can you perhaps post some numbers? I'm curious how they compare. I
just skimmed through the SciPy code and it looks like there are some
other optimizations that the gaussian filter makes. Since it does not
implement the rotated gaussian, it can split it up into two successive
1-d correlations (because the 2-D gaussian is factorable, I think),
which I think saves a lot of looping. There is another optimization in
there that cuts the loop in half if the weights are symmetric, etc.

Anyway, given the size of your weights and kernels, the FFT approach
might be much, much faster (see David's comment). I think this would
be a lot simpler than rotate-filter-rotate. I think I have a matlab
script somewhere that does the FFT variant, if it would help. IIRC
there are a few padding/shifting issues that crop up to get it to
closely match what comes out of convolve.


More information about the SciPy-User mailing list