[SciPy-User] How to fit data with errorbars

Jeremy Conlin jlconlin@gmail....
Wed Feb 17 09:24:43 CST 2010


On Tue, Feb 16, 2010 at 8:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
> On Tue, Feb 16, 2010 at 10:08 AM, Jeremy Conlin <jlconlin@gmail.com> wrote:
>> I have some data with some errobars that I need to fit to a line.  Is
>> there anyway to use scipy.polyfit with the error associated with the
>> data?  If not, how can I make a fit routine work with my data?
>
> If the error is just in the y (dependent) variable, then this may be
> quite simple. If your y values:
>  -- have a normally distributed error
>  -- whose standard deviation varies from point to point
>  -- but the standard deviations are known (up to a multiplicative constant)
> then it's a classic problem -- perhaps google "heterskedasticity
> correction" or "weighted least squares" or "generalized least
> squares". But basically, just make a model matrix X that has one row
> for each observation, and one column for each predictor. If you want
> to do, say, third-order polynomial fitting and you have two vectors of
> independent variables x1 and x2 and you want to include an intercept,
> then it's something like (all untested):
>
> X = np.column_stack([np.ones(len(x1)), x1, x1 ** 2, x1 ** 3, x2, x2 **
> 2, x2 ** 3])
>
> Now a normal least squares fit, ignoring the errorbars, would be:
>
> beta = np.linalg.lstsq(X, y)[0]
>
> with predicted values np.dot(X, beta).
>
> If you have the standard deviation for each measurement in a vector
> stddevs, then the proper heteroskedasticity-corrected fit is just:
>
> beta = np.linalg.lstsq((1 / stddevs) * X, (1 / stddevs) * y)[0]

Yes this is what I have, an array of standard deviations for each data
point.  So this might work. Since I'm doing just a linear fit, I'm
guessing I'd make my X as:

X = np.column_stack([np.ones(len(x1)), x1])

right?

After reading into curve fitting, I realize I know next to nothing
about how this is done.  Can anyone recommend a good book or some sort
of reference to describe curve fitting in general?  I need to
investigate this further.

Thanks,
Jeremy


More information about the SciPy-User mailing list