[SciPy-user] KDE question

Robert Kern robert.kern@gmail....
Fri Nov 9 14:13:02 CST 2007


Zachary Pincus wrote:
> Hello all,
> 
> I'm using scipy.stats.gaussian_kde to estimate the a probability  
> density based on some scattered 2D samples. What I would like to do  
> eventually is plot a line that surrounds (say) 90% of the probability  
> density.
> 
> Clearly this line corresponds to some contour of the density function  
> (which I can extract out easily enough from KDE density estimates on  
> a grid), but I'm not sure how to figure out what exact contour value  
> it is that I want.
> 
> Given the KDE density estimates on a grid, I could empirically figure  
> out which value is closest, but I was wondering if there's a simpler  
> closed-form solution... (And apologies if it's somehow obvious; I'm  
> feeling a bit foggy-headed about this.)

I don't think there is a closed-form solution for n>1-D. It shouldn't be too
hard to evaluate the KDE on a grid, then search for the value between 0 and the
peak where the integral is around 90%.


from scipy import stats
from scipy.optimize import brentq

k = stats.gaussian_kde(data)
xy = ... # the grid
dxdy = ... # the area represented by each grid point
z = k(xy)
peak = z.max()

# You can't quite do this since the function is discontinuous, and never
# actually equals 0, but you can do a search along these lines.
# Exercise left for the reader.
def f(s):
    mask = z > s
    return 0.9 - (z[mask].sum() * dxdy)

brentq(f, 0, peak)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco


More information about the SciPy-user mailing list