[SciPy-User] How to estimate error in polynomial coefficients from scipy.polyfit?
Charles R Harris
Thu Mar 25 19:17:02 CDT 2010
On Thu, Mar 25, 2010 at 4:40 PM, Jeremy Conlin <firstname.lastname@example.org> wrote:
> On Thu, Mar 25, 2010 at 4:21 PM, Anne Archibald
> <email@example.com> wrote:
> > On 25 March 2010 17:20, Jeremy Conlin <firstname.lastname@example.org> wrote:
> >> On Thu, Mar 25, 2010 at 2:52 PM, Charles R Harris
> >> <email@example.com> wrote:
> >>> On Thu, Mar 25, 2010 at 2:32 PM, Jeremy Conlin <firstname.lastname@example.org>
> >>>> I am using scipy.polyfit to fit a curve to my data. Members of this
> >>>> list have been integral in my understanding of how to use this
> >>>> function. Now I would like to know how I can get the uncertainties
> >>>> (standard deviations) of polynomial coefficients from the returned
> >>>> values from scipy.polyfit. If I understand correctly, the residuals
> >>>> are sometimes called the R^2 error, right? That gives an estimate of
> >>>> how well we have fit the data. I don't know how to use rank or any of
> >>>> the other returned values to get the uncertainties.
> >>>> Can someone please help?
> >>> You want the covariance of the coefficients, (A.T * A)^-1/var, where A
> >>> the design matrix. I'd have to see what the scipy fit returns to tell
> >>> more. In anycase, from that you can plot curves at +/- sigma to show
> >>> error bounds on the result. I can be more explicit if you want.
> >>> Chuck
> >> Thanks Chuck. That seems to get closer to what I need. I am just
> >> fitting data to a polynomial, nothing too fancy. I would like the
> >> variance (not the covariance) of the coeffficients. As far as what is
> >> returned from scipy.polyfit this is what the documentation says:
> > Just an important warning: for polynomials, especially high degree
> > polynomials, the coefficients are an awful way to specify them. If
> > what you really need is the coefficients, then you're stuck with it,
> > but if what you need is to estimate interpolated values, there are
> > better approaches.
> > I mention this in particular because you talk about needing "variance
> > rather than covariance". When you do a linear fit of a
> > several-parameter model to data with normal errors, the uncertainty in
> > the fitted parameters is a multidimensional Gaussian - that is, the
> > (say) one-sigma error region is a multidimensional ellipsoid. If you
> > chose a good parameterization for your model, this ellipsoid will line
> > up nicely with the axes, and you can describe it by its size along
> > each axis: that is, you only need a variance in each parameter.
> > But if your parameterization is not good, then this ellipse will be
> > some tilted long skinny shape, and taking its size along one axis can
> > drastically underestimate its size. You also have the problem that
> > there are correlations in your parameters: if one is high, say, then
> > the other is also likely to be high. The covariance matrix captures
> > all this information along with the variances you asked for. As you
> > might expect, life is vastly simpler if the ellipsoid is aligned with
> > the axes, so that this matrix is nearly diagonal and you can just read
> > the variances off the diagonal.
> > Unfortunately, the parameterization of polynomials by their
> > coefficients is not "good" in this sense: you almost always get very
> > long skinny non-aligned error ellipses. An easy example is if you're
> > fitting positive x,y pairs with a straight line y=mx+b. Then if m is
> > too low, b is almost certainly too high, since the uncertainty pivots
> > the line around the middle of the data set. With linear data you can
> > help this by using y = m(x-x_middle)+b, but when you go to
> > higher-order polynomials it all becomes messier and you can't usually
> > cure the problems this easily.
> > I would say, look closely at your problem and think about whether you
> > really need the polynomial coefficients.
> Yikes! This sounds like it may be more trouble than it's worth. I
> have a few sets of statistical data that each need to have curves fit
> to them. I can see what the parameters are and they are similar
> between each set. I want to know if the coefficients are within a few
> standard deviations of each other. That's what I was hoping to get
> at, but it seems to be more trouble than I'm willing to accept.
> Unless anyone has a simpler solution.
> > Anne
> >> residuals, rank, singular_values, rcond : present only if `full` = True
> >> Residuals of the least-squares fit, the effective rank of the
> >> Vandermonde coefficient matrix, its singular values, and the
> >> value of `rcond`. For more details, see `linalg.lstsq`.
> >> I don't think any of these things are "design matrix" as you have
> >> indicated I need. The documentation for linalg.lstsq does not say
> >> what rcond is unfortunately. Any ideas?
lstsq is a bit deficient in that regard, it doesn't return all the info you
need. The design matrix is the matrix A in the equation Ac=y that you want
to solve using least squares. In the case of polynomial fits it is the
Vandermode matrix of the desired degree given by vander(x, ndeg + 1), where
x is the vector of the sample points. To get error estimates you can either
just use the package Josef mentioned or you can go ahead and roll your own,
which is what I generally do. If your sample points are reasonably
distributed and the fit is of low degree you can explicitly compute dot(A.T,
A).inv and multiply it by the residual/(number of data points - number of
coefficients). That gives the covariance, the square root of its diagonal is
the estimated standard deviation of the coefficients. Note that this method
of estimating the variance from the residual isn't very accurate unless you
have a lot of data points, it's better to have independent estimates of the
measurement errors if you can manage that.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User