[SciPy-User] Weighted KDE

Jackson Li sonicboomed@yahoo....
Sun Jan 13 11:53:25 CST 2013


On Sun, Jan 13, 2013 at 10:44 AM, Joe Kington <joferkington@gmail.com>wrote: >For what it's worth, the code you linked to is much slower for small >sample sizes. It's only faster with large numbers (>1e4) of points.  It >also has a bit of a different use case than gaussian_kde.  It's only >intended for making a regularly gridded KDE of a very large number of >points on a relatively fine grid. It bins the data onto a regular grid and >convolves it with an approriate gaussian kernel.  This is a reasonable >approximation when you're dealing with a large number of points, but not so >reasonable if you only have a handful.  Because the size of the gaussian >kernel can be very large when the sample size is low, the convolution can >be very slow for small sample sizes.  Also, If I recall correctly, there's >a stray flipud that got left in there. You'll want to take it out.  (Also, >while I think that got posted only a couple of years ago, I wrote it much
 >longer ago than that... There's some less-than-ideal code in there...) >>However, are you sure that you want a kernel density estimate?  What >you're describing sounds like interpolation, not a weighted KDE. >>As an example, a weighted KDE would be used when you wanted to show the >density of point estimates while weighting it by error in the location of >the point. > >>I shouldn't have said "error in the location of the point". I guess it >>would me more like "confidence that the point exists" or more accurately, >>"magnitude of the point". Otherwise, the size of the Gaussian kernel would >>have to change depending on the data involved. >>As another (not exact) example, it can be handy when you want to sum some >>attribute over a map to yield a density estimate per-unit-area (e.g. >>population density, where you have populations of cities as your point >>measurements). In other words, if you want your temperature values to be >>summed-per-unit-area,
 then it's what you want. If you want to interpolate, >>it's not what you want. >>Instead, it sounds like you have a third variable that you want to make a >continuous map of based on irregularly sampled points.  If so, have a look >at scipy.interpolate (and particularly scipy.interpolate.Rbf). >>Hope that helps, >-Joe

Hi, 
Thanks for the quick reply. 
What you described for the population of cities is indeed what I want.
I have several data points spread out randomly in XY space, and each data point has an independent third variable.
(e.g. for 2 points very close to each other, one 50 and another 10, and all other data points are far away. 
--> I would like that patch to get a value of 30 (average))

Hence, I would like to obtain a XY graph showing the density estimate of the third variable. 
(if that patch is mostly high temperature on average, it should be "red", and if it is empty or has a lot of low temperature data points, then it should be "blue".)

Thank you!
Jackson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130113/70d95380/attachment-0001.html 


More information about the SciPy-User mailing list