[SciPy-Dev] scipy.stats.kde

Sam Birch sam.m.birch@gmail....
Fri Aug 27 14:27:45 CDT 2010


>
> Bandwidth selection is a hotly debated topic, at least in one

dimension, so perhaps not just different methods but tools for

diagnosing bandwidth selection problems would be nice - at the least,

it should be made straightforward to vary the bandwidth (e.g. to plot

the KDE with a range of different bandwidth values).

Well by allowing them to use a custom bandwidth matrix they can vary it
themselves, no?


 At the other end of the spectrum, for very dense KDEs, on the circle I

found it extremely convenient to use Fourier transforms to carry out

the convolution of kernel with points. In particular, I represented

the KDE in terms of its Fourier coefficients, so that an inverse FFT

immediately gave me the KDE evaluated on a grid (or, with some

fiddling, integrated over the bins of a histogram). I don't know

whether this is a useful optimization for KDEs on the line or in

higher dimensions, since there's the problem of wrapping.

That sounds very interesting. Sorry if I'm being dense (or just wrong, or
both), but do you convolve post-FFT or before? If before why does it make it
easier?

-Sam

On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald
<aarchiba@physics.mcgill.ca>wrote:

> My only experience with KDEs has been on the circle, where there seems
> to be little or no literature and the constraints are rather
> different.
>
> On 27 August 2010 14:38,  <josef.pktd@gmail.com> wrote:
> > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch@gmail.com>
> wrote:
> >> Hi all,
> >> I was thinking of renovating the kernel density estimation package
> (although
> >> no promises; I'm leaving for college tomorrow morning!). I was
> wondering:
> >> a) whether anyone had started code in that direction
> >
> > Mike Crowe wrote code for kernel regression  and Skipper started a 1D
> > kernel density estimator in scikits.statsmodels, which cover a larger
> > number of kernels
> >
> > I don't think I have seen any higher dimensional kernel density
> > estimation in python besides scipy.stats.kde. The Gaussian kde in
> > scipy.stats is targeted to the underlying Fortran code for
> > multivariate normal cdf.
> > It's not clear to me what other n-dimensional kdes would require or
> > whether they would fit well with the current code.
> >
> > One extension that Robert also mentioned in the past that it would be
> > nice to have adaptive kernels, which I also haven't seen in python
> > yet.
> >
> >> b) what people want in it
> >> I was thinking (as an ideal, not necessarily goal):
> >> - Support for more than Gaussian kernels (e.g. custom,
> >> uniform, Epanechnikov, triangular, quartic, cosine, etc.)
> >> - More options for bandwidth selection (custom bandwidth matrices, AMISE
> >> optimization, cross-validation, etc.)
> >
> > definitely yes, I don't think they are even available for 1D yet.
>
> Bandwidth selection is a hotly debated topic, at least in one
> dimension, so perhaps not just different methods but tools for
> diagnosing bandwidth selection problems would be nice - at the least,
> it should be made straightforward to vary the bandwidth (e.g. to plot
> the KDE with a range of different bandwidth values).
>
> >> - Assorted conveniences: automatically generate the mesh, limit the
> kernel's
> >> support for speed
> >
> > Using scipy.spatial to limit the number of neighbors in a bounded
> > support kernel might be a good idea.
>
> Simply using it to find the neighbors that need to be used should
> speed things up. There may also be some shortcuts for
> unbounded-support kernels (no point adding a Gaussian a hundred sigma
> away if there's any points nearby).
>
> At the other end of the spectrum, for very dense KDEs, on the circle I
> found it extremely convenient to use Fourier transforms to carry out
> the convolution of kernel with points. In particular, I represented
> the KDE in terms of its Fourier coefficients, so that an inverse FFT
> immediately gave me the KDE evaluated on a grid (or, with some
> fiddling, integrated over the bins of a histogram). I don't know
> whether this is a useful optimization for KDEs on the line or in
> higher dimensions, since there's the problem of wrapping.
>
> Anne
>
> > (just some thought on the topic)
> >
> > Josef
> >
> >> So, thoughts anyone? I figure it's better to over-specify and then
> >> under-produce, so don't hold back.
> >> Thanks,
> >> Sam
> >> _______________________________________________
> >> SciPy-Dev mailing list
> >> SciPy-Dev@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >>
> >>
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev@scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20100827/92c066fe/attachment-0001.html 


More information about the SciPy-Dev mailing list