[SciPy-User] How to estimate error in polynomial coefficients from scipy.polyfit?
Thu Mar 25 19:23:51 CDT 2010
On Thu, Mar 25, 2010 at 6:45 PM, Anne Archibald
> On 25 March 2010 18:40, Jeremy Conlin <firstname.lastname@example.org> wrote:
>> On Thu, Mar 25, 2010 at 4:21 PM, Anne Archibald
>> <email@example.com> wrote:
>>> On 25 March 2010 17:20, Jeremy Conlin <firstname.lastname@example.org> wrote:
>>>> On Thu, Mar 25, 2010 at 2:52 PM, Charles R Harris
>>>> <email@example.com> wrote:
>>>>> On Thu, Mar 25, 2010 at 2:32 PM, Jeremy Conlin <firstname.lastname@example.org> wrote:
>>>>>> I am using scipy.polyfit to fit a curve to my data. Members of this
>>>>>> list have been integral in my understanding of how to use this
>>>>>> function. Now I would like to know how I can get the uncertainties
>>>>>> (standard deviations) of polynomial coefficients from the returned
>>>>>> values from scipy.polyfit. If I understand correctly, the residuals
>>>>>> are sometimes called the R^2 error, right? That gives an estimate of
>>>>>> how well we have fit the data. I don't know how to use rank or any of
>>>>>> the other returned values to get the uncertainties.
>>>>>> Can someone please help?
>>>>> You want the covariance of the coefficients, (A.T * A)^-1/var, where A is
>>>>> the design matrix. I'd have to see what the scipy fit returns to tell you
>>>>> more. In anycase, from that you can plot curves at +/- sigma to show the
>>>>> error bounds on the result. I can be more explicit if you want.
>>>> Thanks Chuck. That seems to get closer to what I need. I am just
>>>> fitting data to a polynomial, nothing too fancy. I would like the
>>>> variance (not the covariance) of the coeffficients. As far as what is
>>>> returned from scipy.polyfit this is what the documentation says:
>>> Just an important warning: for polynomials, especially high degree
>>> polynomials, the coefficients are an awful way to specify them. If
>>> what you really need is the coefficients, then you're stuck with it,
>>> but if what you need is to estimate interpolated values, there are
>>> better approaches.
>>> I mention this in particular because you talk about needing "variance
>>> rather than covariance". When you do a linear fit of a
>>> several-parameter model to data with normal errors, the uncertainty in
>>> the fitted parameters is a multidimensional Gaussian - that is, the
>>> (say) one-sigma error region is a multidimensional ellipsoid. If you
>>> chose a good parameterization for your model, this ellipsoid will line
>>> up nicely with the axes, and you can describe it by its size along
>>> each axis: that is, you only need a variance in each parameter.
>>> But if your parameterization is not good, then this ellipse will be
>>> some tilted long skinny shape, and taking its size along one axis can
>>> drastically underestimate its size. You also have the problem that
>>> there are correlations in your parameters: if one is high, say, then
>>> the other is also likely to be high. The covariance matrix captures
>>> all this information along with the variances you asked for. As you
>>> might expect, life is vastly simpler if the ellipsoid is aligned with
>>> the axes, so that this matrix is nearly diagonal and you can just read
>>> the variances off the diagonal.
>>> Unfortunately, the parameterization of polynomials by their
>>> coefficients is not "good" in this sense: you almost always get very
>>> long skinny non-aligned error ellipses. An easy example is if you're
>>> fitting positive x,y pairs with a straight line y=mx+b. Then if m is
>>> too low, b is almost certainly too high, since the uncertainty pivots
>>> the line around the middle of the data set. With linear data you can
>>> help this by using y = m(x-x_middle)+b, but when you go to
>>> higher-order polynomials it all becomes messier and you can't usually
>>> cure the problems this easily.
>>> I would say, look closely at your problem and think about whether you
>>> really need the polynomial coefficients.
>> Yikes! This sounds like it may be more trouble than it's worth. I
>> have a few sets of statistical data that each need to have curves fit
>> to them. I can see what the parameters are and they are similar
>> between each set. I want to know if the coefficients are within a few
>> standard deviations of each other. That's what I was hoping to get
>> at, but it seems to be more trouble than I'm willing to accept.
>> Unless anyone has a simpler solution.
> You could try using the chebyshev polynomials; they're in the new
> numpy, or you can just use the formulas (the degree-n one is cos(n
> arccos(x)), and they're sensible on [0,1]). They're not perfect, but
> they're a much better representation of polynomials that gets rid of
> most of the problems I described above (the covariances tend to be
> much less).
I don't think that multicollinearity matters so much when all
coefficients included in the null hypothesis, because the test is
based on the entire covariance of the parameter estimates and not just
the variance, e.g. with F-test.
One of the first references for testing equality of regression
coefficients across regressions that google found is
(google is swallowing my sent emails today, so it will come up double
or not at all)
>>>> residuals, rank, singular_values, rcond : present only if `full` = True
>>>> Residuals of the least-squares fit, the effective rank of the scaled
>>>> Vandermonde coefficient matrix, its singular values, and the specified
>>>> value of `rcond`. For more details, see `linalg.lstsq`.
>>>> I don't think any of these things are "design matrix" as you have
>>>> indicated I need. The documentation for linalg.lstsq does not say
>>>> what rcond is unfortunately. Any ideas?
>>>> SciPy-User mailing list
>>> SciPy-User mailing list
>> SciPy-User mailing list
> SciPy-User mailing list
More information about the SciPy-User