[SciPy-user] gaussian_kde bandwidth?

Robert Kern robert.kern at gmail.com
Sat Dec 31 19:33:07 CST 2005


On 12/30/05, Gary <pajer at iname.com> wrote:
> Can one specify a bandwidth for the kernel in scipy.stats.gaussian_kde?
>
> What is the default bandwidth?   I checked the source code,  but it was
> too obscure for me.
>
> If it is fixed, what is the reasoning?  Don't I *want* to be able to
> adjust it?

The default uses Scott's Rule to calculate an "optimal" bandwidth
(minimum asymptotic mean integrated square error for Gaussian true
densities, I believe). You can change the method that calculates the
bandwidth by overriding the method covariance_factor. The module was
written for an application for which this was sufficient, and then it
was contributed to scipy. There is certainly room for making it more
sophisticated. There are endless numbers of ways to do bandwidth
selection. This is frequently a bad thing.

Here's the TODO list for kde.py:

* Split out univariate from multivariate; there are some approaches
that are much easier (or simply possible) for univariate KDE than
multivariate.

* Provide more ways to select a bandwidth including k-nearest
neighbors (univariate only).

* Add more kernels besides Gaussians.

I probably won't be getting to all of these, so contributions are welcome.

--
Robert Kern
robert.kern at gmail.com



More information about the SciPy-user mailing list