[SciPy-Dev] Expanding Scipy's KDE functionality
Wed Jan 23 14:30:19 CST 2013
That looks like a nice implementation. My concern about adding it to
scipy is twofold:
1) Is this a well-known and well-proven technique, or is it more
cutting-edge? My view is that scipy should not seek to implement every
cutting-edge algorithm: in the long-run this will lead to code bloat and
difficulty of maintenance. If that's the case, your code might be a
better fit for statsmodels or another more specialized package.
2) The algorithm seems limited to one or maybe two dimensions.
scipy.stats.gaussian_kde is designed for N dimensions, so it might be
difficult to find a fit for this bandwidth selection method. One option
might be to allow this bandwidth selection method via a flag in
scipy.stats.gaussian_kde, and raise an error if the dimensionality is
too high. To do that, your code would need to be reworked fairly
extensively to fit in the gaussian_kde class.
I'd like other devs to weigh-in about the algorithm, especially my
concern #1, before any work starts on a scipy PR. Thanks,
On 01/23/2013 12:11 PM, Daniel Smith wrote:
> This was started on a different thread, but I thought I would post a
> new thread focused on this. Currently, I have some existing code that
> implements the bandwidth selection algorithm from:
> Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density
> estimation via diffusion. The Annals of Statistics, 38(5):2916-2957,
> Zdravko Botev implemented the code in MatLab which can be found here:
> My code for that is here:
> I assume I probably need to find a workaround to avoid the float128 in
> the function fixed_point before I can add it to SciPy. I wrote the
> code a couple of years ago, so it will take me a moment to map out the
> best workaround (there is a very large number being multiplied by a
> very small number). I can also add the 2d-version once I start
> integrating with SciPy. I have a couple of questions remaining. First,
> should I implement this in SciPy? StatsModels? Both? Secondly, can I
> use Cython to generate C code for the function fixed_point? Or do I
> need to write it up in the Numpy C API?
> If there is somewhere else I should post this and/or someone I should
> directly contact, I would greatly appreciate it.
> SciPy-Dev mailing list
More information about the SciPy-Dev