[SciPy-user] interpolate

Robert Kern robert.kern@gmail....
Fri Nov 14 22:25:02 CST 2008

On Fri, Nov 14, 2008 at 22:16, David Warde-Farley <dwf@cs.toronto.edu> wrote:
> On 14-Nov-08, at 6:26 PM, Anne Archibald wrote:
>> The knots are specified in a form that allows them all to be treated
>> identically. This sometimes means repeating knots or having zero
>> coefficients.
>> If you have more data points than you want knots, then you are going
>> to be producing a spline which does not pass through all the data. The
>> smoothing splines include an automatic number-of-knots selector, which
>> you may prefer to specifying the number of knots yourself. it chooses
>> (approximately) the minimum number of knots needed to let the curve
>> pass within one sigma of the data points, so by adjusting the
>> smoothing parameter and the weights you can tune the number of knots.
>> Evaluation time is not particularly sensitive to the number of knots
>> (though of course memory usage is).
> I see. I'm interested in doing is modeling the variation in the
> curves, presumably via a description of the joint distribution of the
> spline coefficients.  This gets difficult if the number of knots is
> variable, which is why I've gone this route.  It's not important that
> the curves fit the data exactly, but part of the reason for fitting
> splines is to reduce each of many, many curves to a fixed-length
> description. Does this make sense?

I'm not entirely sure how applicable this paper is to your problem,
but it does have an approach for dealing with varying numbers of knots
in an MCMC context:


An Implementation of Bayesian Adaptive Regression Splines (BARS) in C
with S and R Wrappers

BARS (DiMatteo, Genovese, and Kass 2001) uses the powerful
reversible-jump MCMC engine to perform spline-based generalized
nonparametric regression. It has been shown to work well in terms of
having small mean-squared error in many examples (smaller than known
competitors), as well as producing visually-appealing fits that are
smooth (filtering out high-frequency noise) while adapting to sudden
changes (retaining high-frequency signal). However, BARS is
computationally intensive. The original implementation in S was too
slow to be practical in certain situations, and was found to handle
some data sets incorrectly. We have implemented BARS in C for the
normal and Poisson cases, the latter being important in
neurophysiological and other point-process applications. The C
implementation includes all needed subroutines for fitting Poisson
regression, manipulating B-splines (using code created by Bates and
Venables), and finding starting values for Poisson regression (using
code for density estimation created by Kooperberg). The code utilizes
only freely-available external libraries (LAPACK and BLAS) and is
otherwise self-contained. We have also provided wrappers so that BARS
can be used easily within S or R.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

More information about the SciPy-user mailing list