Thu Sep 25 17:30:06 CDT 2008
I've just written a description of how scipy's smoothing splines work
(because a friend is publishing a paper that used them). I had to go
look up the original research papers, because there isn't really a
good description in the python documentation, or even in the FITPACK
source. It seems like it would be valuable to have such a description,
but I'm not sure where it should go: duplicated in every one of
scipy's spline-fitting functions? at the module level? in some
automatic docstring transmogrifier that puts it in all relevant
docstrings without requiring source code duplication?
The splines constructed by this code are described in a number of
papers (see below). Summarizing based on Dierckx 1982, the goal is to
find a spline for which the "smoothness" (sum of squares of jumps in
the highest derivative at the joins) is as small as possible given
that the (weighted) sum of squares of residuals is S. This can be
achieved by choosing one knot per data point, but in the interests of
efficiency and smoothness, the algorithm attempts to select the
smallest set of knots that can achieve this balance.
For any given set of knots, by adjusting the relative importance of
smoothness and good fitting the algorithm can go from the
least-squares polynomial of degree K (totally smooth) to the
least-squares spline (big jumps at each knot). The code tries to find
the smallest set of knots for which the least-squares spline gives you
residuals better than S, then adjusts the relative importance of
smoothing and quality-of-fit to make the curve smoother until the
residuals are exactly S.
It finds this best set of knots by starting with the bare minimum of
knots - a one-segment spline - and checking whether the least-squares
spline (with no smoothness constraints) fits the data as well as S. If
not, then it subdivides the interval by introducing a knot. The
subdivided spline is again checked to see whether it can fit the data
with sum of squares of residuals no worse than S. If not, the
worst-fitting subintervals are subdivided again. This is repeated
until there are enough knots to make it possible to fit with residuals
no worse than S. When this is finally achieved - if necessary by
having a knot at every data point so the spline interpolates them -
the spline is adjusted to make it smoother but the fit worse until a
spline is obtained with sum of squares of residuals exactly S. The
procedure does not attempt to produce the strictly minimal set of
knots, but it does stop introducing new knots as soon as possible.
The process is described in detail in:
Dierckx P. : An Algorithm for Smoothing, Differentiation and Integ-
ration Of Experimental Data Using Spline Functions,
J.Comp.Appl.Maths 1 (1975) 165-184.
Dierckx P. : A Fast Algorithm for Smoothing Data on a Rectangular
Grid While Using Spline Functions, Siam J.Numer.Anal.
19 (1982) 1286-1304.
Dierckx P. : An Improved Algorithm for Curve Fitting with Spline
Functions, Report TW54, Dept. Computer Science, K.U.
Dierckx P. : Curve and Surface Fitting with Splines, Monographs on
Numerical Analysis, Oxford University Press, 1993.
The code in scipy is a python wrapper of the FITPACK routines written by:
Dept. Computer Science, K.U. Leuven
Celestijnenlaan 200a, B-3001 Heverlee, Belgium.
e-mail : Paul.Dierckx@cs.kuleuven.ac.be
More information about the SciPy-user