[Scipy-tickets] [SciPy] #1854: scipy.stats.gaussian_kde.integrate functions are slow

Fri Mar 1 07:27:50 CST 2013

#1854: scipy.stats.gaussian_kde.integrate functions are slow
Comment(by josefpktd):

 Usage: I don't think that for datasets this large and one-dimensional,
 gaussian_kde is an efficient approach.

 For example statsmodels (and some other packages) use fft on the binned
 data which is much faster. IIRC, statsmodels only uses integrate.quad to
 get the cdf from the kde, I don't know if fft could be used for that.

 Also, unless you want tail probabilities with few observations, using the
 empirical cdf (just count the number of points in the interval) might be
 pretty accurate with this many points.

 (aside: statsmodels has a kernel estimator for the cdf directly, but that
 will be slow since it loops over points.)

