[SciPy-User] How to estimate error in polynomial coefficients from scipy.polyfit?
Thu Mar 25 17:40:06 CDT 2010
On Thu, Mar 25, 2010 at 4:21 PM, Anne Archibald
> On 25 March 2010 17:20, Jeremy Conlin <firstname.lastname@example.org> wrote:
>> On Thu, Mar 25, 2010 at 2:52 PM, Charles R Harris
>> <email@example.com> wrote:
>>> On Thu, Mar 25, 2010 at 2:32 PM, Jeremy Conlin <firstname.lastname@example.org> wrote:
>>>> I am using scipy.polyfit to fit a curve to my data. Members of this
>>>> list have been integral in my understanding of how to use this
>>>> function. Now I would like to know how I can get the uncertainties
>>>> (standard deviations) of polynomial coefficients from the returned
>>>> values from scipy.polyfit. If I understand correctly, the residuals
>>>> are sometimes called the R^2 error, right? That gives an estimate of
>>>> how well we have fit the data. I don't know how to use rank or any of
>>>> the other returned values to get the uncertainties.
>>>> Can someone please help?
>>> You want the covariance of the coefficients, (A.T * A)^-1/var, where A is
>>> the design matrix. I'd have to see what the scipy fit returns to tell you
>>> more. In anycase, from that you can plot curves at +/- sigma to show the
>>> error bounds on the result. I can be more explicit if you want.
>> Thanks Chuck. That seems to get closer to what I need. I am just
>> fitting data to a polynomial, nothing too fancy. I would like the
>> variance (not the covariance) of the coeffficients. As far as what is
>> returned from scipy.polyfit this is what the documentation says:
> Just an important warning: for polynomials, especially high degree
> polynomials, the coefficients are an awful way to specify them. If
> what you really need is the coefficients, then you're stuck with it,
> but if what you need is to estimate interpolated values, there are
> better approaches.
> I mention this in particular because you talk about needing "variance
> rather than covariance". When you do a linear fit of a
> several-parameter model to data with normal errors, the uncertainty in
> the fitted parameters is a multidimensional Gaussian - that is, the
> (say) one-sigma error region is a multidimensional ellipsoid. If you
> chose a good parameterization for your model, this ellipsoid will line
> up nicely with the axes, and you can describe it by its size along
> each axis: that is, you only need a variance in each parameter.
> But if your parameterization is not good, then this ellipse will be
> some tilted long skinny shape, and taking its size along one axis can
> drastically underestimate its size. You also have the problem that
> there are correlations in your parameters: if one is high, say, then
> the other is also likely to be high. The covariance matrix captures
> all this information along with the variances you asked for. As you
> might expect, life is vastly simpler if the ellipsoid is aligned with
> the axes, so that this matrix is nearly diagonal and you can just read
> the variances off the diagonal.
> Unfortunately, the parameterization of polynomials by their
> coefficients is not "good" in this sense: you almost always get very
> long skinny non-aligned error ellipses. An easy example is if you're
> fitting positive x,y pairs with a straight line y=mx+b. Then if m is
> too low, b is almost certainly too high, since the uncertainty pivots
> the line around the middle of the data set. With linear data you can
> help this by using y = m(x-x_middle)+b, but when you go to
> higher-order polynomials it all becomes messier and you can't usually
> cure the problems this easily.
> I would say, look closely at your problem and think about whether you
> really need the polynomial coefficients.
Yikes! This sounds like it may be more trouble than it's worth. I
have a few sets of statistical data that each need to have curves fit
to them. I can see what the parameters are and they are similar
between each set. I want to know if the coefficients are within a few
standard deviations of each other. That's what I was hoping to get
at, but it seems to be more trouble than I'm willing to accept.
Unless anyone has a simpler solution.
>> residuals, rank, singular_values, rcond : present only if `full` = True
>> Residuals of the least-squares fit, the effective rank of the scaled
>> Vandermonde coefficient matrix, its singular values, and the specified
>> value of `rcond`. For more details, see `linalg.lstsq`.
>> I don't think any of these things are "design matrix" as you have
>> indicated I need. The documentation for linalg.lstsq does not say
>> what rcond is unfortunately. Any ideas?
>> SciPy-User mailing list
> SciPy-User mailing list
More information about the SciPy-User