[SciPy-User] How to estimate error in polynomial coefficients from scipy.polyfit?
Thu Mar 25 17:45:56 CDT 2010
On 25 March 2010 18:40, Jeremy Conlin <email@example.com> wrote:
> On Thu, Mar 25, 2010 at 4:21 PM, Anne Archibald
> <firstname.lastname@example.org> wrote:
>> On 25 March 2010 17:20, Jeremy Conlin <email@example.com> wrote:
>>> On Thu, Mar 25, 2010 at 2:52 PM, Charles R Harris
>>> <firstname.lastname@example.org> wrote:
>>>> On Thu, Mar 25, 2010 at 2:32 PM, Jeremy Conlin <email@example.com> wrote:
>>>>> I am using scipy.polyfit to fit a curve to my data. Members of this
>>>>> list have been integral in my understanding of how to use this
>>>>> function. Now I would like to know how I can get the uncertainties
>>>>> (standard deviations) of polynomial coefficients from the returned
>>>>> values from scipy.polyfit. If I understand correctly, the residuals
>>>>> are sometimes called the R^2 error, right? That gives an estimate of
>>>>> how well we have fit the data. I don't know how to use rank or any of
>>>>> the other returned values to get the uncertainties.
>>>>> Can someone please help?
>>>> You want the covariance of the coefficients, (A.T * A)^-1/var, where A is
>>>> the design matrix. I'd have to see what the scipy fit returns to tell you
>>>> more. In anycase, from that you can plot curves at +/- sigma to show the
>>>> error bounds on the result. I can be more explicit if you want.
>>> Thanks Chuck. That seems to get closer to what I need. I am just
>>> fitting data to a polynomial, nothing too fancy. I would like the
>>> variance (not the covariance) of the coeffficients. As far as what is
>>> returned from scipy.polyfit this is what the documentation says:
>> Just an important warning: for polynomials, especially high degree
>> polynomials, the coefficients are an awful way to specify them. If
>> what you really need is the coefficients, then you're stuck with it,
>> but if what you need is to estimate interpolated values, there are
>> better approaches.
>> I mention this in particular because you talk about needing "variance
>> rather than covariance". When you do a linear fit of a
>> several-parameter model to data with normal errors, the uncertainty in
>> the fitted parameters is a multidimensional Gaussian - that is, the
>> (say) one-sigma error region is a multidimensional ellipsoid. If you
>> chose a good parameterization for your model, this ellipsoid will line
>> up nicely with the axes, and you can describe it by its size along
>> each axis: that is, you only need a variance in each parameter.
>> But if your parameterization is not good, then this ellipse will be
>> some tilted long skinny shape, and taking its size along one axis can
>> drastically underestimate its size. You also have the problem that
>> there are correlations in your parameters: if one is high, say, then
>> the other is also likely to be high. The covariance matrix captures
>> all this information along with the variances you asked for. As you
>> might expect, life is vastly simpler if the ellipsoid is aligned with
>> the axes, so that this matrix is nearly diagonal and you can just read
>> the variances off the diagonal.
>> Unfortunately, the parameterization of polynomials by their
>> coefficients is not "good" in this sense: you almost always get very
>> long skinny non-aligned error ellipses. An easy example is if you're
>> fitting positive x,y pairs with a straight line y=mx+b. Then if m is
>> too low, b is almost certainly too high, since the uncertainty pivots
>> the line around the middle of the data set. With linear data you can
>> help this by using y = m(x-x_middle)+b, but when you go to
>> higher-order polynomials it all becomes messier and you can't usually
>> cure the problems this easily.
>> I would say, look closely at your problem and think about whether you
>> really need the polynomial coefficients.
> Yikes! This sounds like it may be more trouble than it's worth. I
> have a few sets of statistical data that each need to have curves fit
> to them. I can see what the parameters are and they are similar
> between each set. I want to know if the coefficients are within a few
> standard deviations of each other. That's what I was hoping to get
> at, but it seems to be more trouble than I'm willing to accept.
> Unless anyone has a simpler solution.
You could try using the chebyshev polynomials; they're in the new
numpy, or you can just use the formulas (the degree-n one is cos(n
arccos(x)), and they're sensible on [0,1]). They're not perfect, but
they're a much better representation of polynomials that gets rid of
most of the problems I described above (the covariances tend to be
>>> residuals, rank, singular_values, rcond : present only if `full` = True
>>> Residuals of the least-squares fit, the effective rank of the scaled
>>> Vandermonde coefficient matrix, its singular values, and the specified
>>> value of `rcond`. For more details, see `linalg.lstsq`.
>>> I don't think any of these things are "design matrix" as you have
>>> indicated I need. The documentation for linalg.lstsq does not say
>>> what rcond is unfortunately. Any ideas?
>>> SciPy-User mailing list
>> SciPy-User mailing list
> SciPy-User mailing list
More information about the SciPy-User