[SciPy-Dev] scipy.stats.kde

Anne Archibald aarchiba@physics.mcgill...
Fri Aug 27 13:48:29 CDT 2010


My only experience with KDEs has been on the circle, where there seems
to be little or no literature and the constraints are rather
different.

On 27 August 2010 14:38,  <josef.pktd@gmail.com> wrote:
> On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch@gmail.com> wrote:
>> Hi all,
>> I was thinking of renovating the kernel density estimation package (although
>> no promises; I'm leaving for college tomorrow morning!). I was wondering:
>> a) whether anyone had started code in that direction
>
> Mike Crowe wrote code for kernel regression  and Skipper started a 1D
> kernel density estimator in scikits.statsmodels, which cover a larger
> number of kernels
>
> I don't think I have seen any higher dimensional kernel density
> estimation in python besides scipy.stats.kde. The Gaussian kde in
> scipy.stats is targeted to the underlying Fortran code for
> multivariate normal cdf.
> It's not clear to me what other n-dimensional kdes would require or
> whether they would fit well with the current code.
>
> One extension that Robert also mentioned in the past that it would be
> nice to have adaptive kernels, which I also haven't seen in python
> yet.
>
>> b) what people want in it
>> I was thinking (as an ideal, not necessarily goal):
>> - Support for more than Gaussian kernels (e.g. custom,
>> uniform, Epanechnikov, triangular, quartic, cosine, etc.)
>> - More options for bandwidth selection (custom bandwidth matrices, AMISE
>> optimization, cross-validation, etc.)
>
> definitely yes, I don't think they are even available for 1D yet.

Bandwidth selection is a hotly debated topic, at least in one
dimension, so perhaps not just different methods but tools for
diagnosing bandwidth selection problems would be nice - at the least,
it should be made straightforward to vary the bandwidth (e.g. to plot
the KDE with a range of different bandwidth values).

>> - Assorted conveniences: automatically generate the mesh, limit the kernel's
>> support for speed
>
> Using scipy.spatial to limit the number of neighbors in a bounded
> support kernel might be a good idea.

Simply using it to find the neighbors that need to be used should
speed things up. There may also be some shortcuts for
unbounded-support kernels (no point adding a Gaussian a hundred sigma
away if there's any points nearby).

At the other end of the spectrum, for very dense KDEs, on the circle I
found it extremely convenient to use Fourier transforms to carry out
the convolution of kernel with points. In particular, I represented
the KDE in terms of its Fourier coefficients, so that an inverse FFT
immediately gave me the KDE evaluated on a grid (or, with some
fiddling, integrated over the bins of a histogram). I don't know
whether this is a useful optimization for KDEs on the line or in
higher dimensions, since there's the problem of wrapping.

Anne

> (just some thought on the topic)
>
> Josef
>
>> So, thoughts anyone? I figure it's better to over-specify and then
>> under-produce, so don't hold back.
>> Thanks,
>> Sam
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


More information about the SciPy-Dev mailing list