# [SciPy-User] How to estimate error in polynomial coefficients from scipy.polyfit?

David Goldsmith d.l.goldsmith@gmail....
Mon Mar 29 12:13:41 CDT 2010

```On Mon, Mar 29, 2010 at 8:08 AM, Jeremy Conlin <jlconlin@gmail.com> wrote:

> On Thu, Mar 25, 2010 at 9:40 PM, David Goldsmith
> <d.l.goldsmith@gmail.com> wrote:
> > On Thu, Mar 25, 2010 at 3:40 PM, Jeremy Conlin <jlconlin@gmail.com>
> wrote:
> >>
> >> Yikes!  This sounds like it may be more trouble than it's worth.  I
> >> have a few sets of statistical data that each need to have curves fit
> >> to them.
> >
> > That's an awfully generic need - it may be obvious from examination of
> the
> > data that a line is inappropriate, but besides polynomials there are many
> > other non-linear models (which can be linearly fit to data by means of
> data
> > transformation) which possess fewer parameters (and thus are simpler from
> a
> > parameter analysis perspective).  So, the question is: why are you
> fitting
> > to polynomials?  If it's just to get a good fit to the data, you might be
> > getting "more fit" than your data warrants (and even if that isn't a
> > problem, you probably want to use a polynomial form different from
> "standard
> > form," e.g., Lagrange interpolators).  Are you sure something like an
> > exponential growth/decay or power law model (both of which are "more
> > natural," linearizable, two-parameter models) wouldn't be more
> appropriate -
> > it would almost certainly be simpler to analyze (and perhaps easier to
> > justify to a referee).
> >
> > On this note, perhaps some of our experts might care to comment: what
> > "physics" (in a generalized sense) gives rise to a polynomial dependency
> of
> > degree higher than two?  The only generic thing I can think of is
> something
> > where third or higher order derivatives proportional to the independent
> > variable are important, and those are pretty uncommon.
>
> I will only be fitting data to a first or second degree polynomial.
> Eventually I would like to fit my data to an exponential or a power
> law, just to see how it compares to a low-order polynomial.  Choosing
> these functions was based on qualitative analysis (i.e. "it looks
>

This is a good approach if applied to the right data: the residuals, i.e.,
the (signed, not absolute value) differences between the predicted and
corresponding measured dependent variable values.  For example, if you fit a
line to quadratic data and then plot the residuals v. the independent
variable, you'll see systematic error: most of the residuals at the
extremities will be on one side of zero, while most of those around the
center will be on the other side of zero, i.e., the residuals will look like
a parabola.  If you then fit the original data to a quadratic model (and a
quadratic model is "physically" appropriate) and plot the residuals, these
should appear to be randomly (and rather symmetrically) distributed around
zero.  This generalizes: systematic behavior of residuals reveals
deficiencies in the assumed form of your model - the "best" (most
"physical") model produces residuals randomly (though not necessarily
uniformly) distributed around zero (because the "best" model captures all
non-randomness in your data).  Moral: when doing regression where you're not
a priori certain of your model, always plot the residuals.

DG

>
> The best case scenario would be that I take what I learn from this
> "simple" example and apply it to more difficult problems as they come
> along down the road.  It appears, however, that it's not so simple to
> apply it to other problems.  I wish I had more time to learn about
> fitting data to curves.  I'm sure there are a lot of powerful tools
> that can help.
>
> Jeremy
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100329/181e8116/attachment.html
```