[SciPy-User] How to estimate error in polynomial coefficients from scipy.polyfit?

Jeremy Conlin jlconlin@gmail....
Thu Mar 25 17:40:06 CDT 2010


On Thu, Mar 25, 2010 at 4:21 PM, Anne Archibald
<peridot.faceted@gmail.com> wrote:
> On 25 March 2010 17:20, Jeremy Conlin <jlconlin@gmail.com> wrote:
>> On Thu, Mar 25, 2010 at 2:52 PM, Charles R Harris
>> <charlesr.harris@gmail.com> wrote:
>>>
>>>
>>> On Thu, Mar 25, 2010 at 2:32 PM, Jeremy Conlin <jlconlin@gmail.com> wrote:
>>>>
>>>> I am using scipy.polyfit to fit a curve to my data.  Members of this
>>>> list have been integral in my understanding of how to use this
>>>> function.  Now I would like to know how I can get the uncertainties
>>>> (standard deviations) of polynomial coefficients from the returned
>>>> values from scipy.polyfit.  If I understand correctly, the residuals
>>>> are sometimes called the R^2 error, right?  That gives an estimate of
>>>> how well we have fit the data.  I don't know how to use rank or any of
>>>> the other returned values to get the uncertainties.
>>>>
>>>> Can someone please help?
>>>
>>> You want the covariance of the coefficients, (A.T * A)^-1/var, where A is
>>> the design matrix. I'd have to see what the scipy fit returns to tell you
>>> more. In anycase, from that you can plot curves at +/- sigma to show the
>>> error bounds on the result. I can be more explicit if you want.
>>>
>>> Chuck
>>
>> Thanks Chuck.  That seems to get closer to what I need.  I am just
>> fitting data to a polynomial, nothing too fancy.  I would like the
>> variance (not the covariance) of the coeffficients.  As far as what is
>> returned from scipy.polyfit this is what the documentation says:
>
> Just an important warning: for polynomials, especially high degree
> polynomials, the coefficients are an awful way to specify them. If
> what you really need is the coefficients, then you're stuck with it,
> but if what you need is to estimate interpolated values, there are
> better approaches.
>
> I mention this in particular because you talk about needing "variance
> rather than covariance". When you do a linear fit of a
> several-parameter model to data with normal errors, the uncertainty in
> the fitted parameters is a multidimensional Gaussian - that is, the
> (say) one-sigma error region is a multidimensional ellipsoid. If you
> chose a good parameterization for your model, this ellipsoid will line
> up nicely with the axes, and you can describe it by its size along
> each axis: that is, you only need a variance in each parameter.
>
> But if your parameterization is not good, then this ellipse will be
> some tilted long skinny shape, and taking its size along one axis can
> drastically underestimate its size. You also have the problem that
> there are correlations in your parameters: if one is high, say, then
> the other is also likely to be high. The covariance matrix captures
> all this information along with the variances you asked for. As you
> might expect, life is vastly simpler if the ellipsoid is aligned with
> the axes, so that this matrix is nearly diagonal and you can just read
> the variances off the diagonal.
>
> Unfortunately, the parameterization of polynomials by their
> coefficients is not "good" in this sense: you almost always get very
> long skinny non-aligned error ellipses. An easy example is if you're
> fitting positive x,y pairs with a straight line y=mx+b. Then if m is
> too low, b is almost certainly too high, since the uncertainty pivots
> the line around the middle of the data set. With linear data you can
> help this by using y = m(x-x_middle)+b, but when you go to
> higher-order polynomials it all becomes messier and you can't usually
> cure the problems this easily.
>
> I would say, look closely at your problem and think about whether you
> really need the polynomial coefficients.

Yikes!  This sounds like it may be more trouble than it's worth.  I
have a few sets of statistical data that each need to have curves fit
to them.  I can see what the parameters are and they are similar
between each set.  I want to know if the coefficients are within a few
standard deviations of each other.  That's what I was hoping to get
at, but it seems to be more trouble than I'm willing to accept.
Unless anyone has a simpler solution.

Thanks,
Jeremy
>
> Anne
>
>> residuals, rank, singular_values, rcond : present only if `full` = True
>>        Residuals of the least-squares fit, the effective rank of the scaled
>>        Vandermonde coefficient matrix, its singular values, and the specified
>>        value of `rcond`. For more details, see `linalg.lstsq`.
>>
>> I don't think any of these things are "design matrix" as you have
>> indicated I need.  The documentation for linalg.lstsq does not say
>> what rcond is unfortunately.   Any ideas?
>>
>> Jeremy
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list