[SciPy-User] How to fit data with errorbars

Joe Harrington jh@physics.ucf....
Wed Feb 17 11:12:12 CST 2010


> I have some data with some errobars that I need to fit to a line.

Actually, you should fit the line; please don't change the data! ;-)

The routine below does the job.  I was surprised not to find this
routine in numpy or scipy as it's very standard (I'm still convinced I
must have missed it, so somebody please embarrass me and point it
out).

The routine has a number of parameters that are standard in other
numerical packages, such as IDL, for *every* fitting routine.  We
should consider adding these parameters to all our fitting routines,
as well, since having them encourages people to fit (more) properly
(i.e., actually assess the goodness of fit, etc.).

--jh--

import numpy as np
import scipy.special as ss

def linfit(y, x=None, y_unc=None):
  '''
    Fits a line to 2D data, optionally with errors in y.

    The method is robust to roundoff error.

    Parameters
    ----------
    y: ndarray
        Ordinates, any shape.
    x: ndarray
        Abcissas, same shape.  Defaults to np.indices(y.length))
    y_unc: ndarray
        Uncertainties in y.  If scalar or 1-element array, applied
        uniformly to all y values.  [NOT IMPLEMENTED YET!]  Must be
        positive.

    Returns
    -------
    a: scalar
        0 Fitted intercept
    b: scalar
        1 Fitted slope
    a_unc: scalar
        2 Uncertainty of fitted intercept
    b_unc: scalar
        3 Uncertainty of fitted slope
    chisq: scalar
        4 Chi-squared
    prob: scalar
        5 Probability of finding worse Chi-squared for this model with
          these uncertainties.
    covar: ndarray
        6 Covariance matrix: [[a_unc**2,  covar_ab],
		              [covar_ab,  b_unc**2]]
    yfit: ndarray
        7 Model array calculated for our abcissas
    
    Notes
    -----
    
    If prob > 0.1, you can believe the fit.  If prob > 0.001 and the
    errors are not Gaussian, you could believe the fit.  Otherwise
    do not believe it.

    See Also
    --------
    Press, et al., Numerical Recipes in C, 2nd ed, section 15.2,
    or any standard data analysis text.

    Examples
    --------
    >>> import linfit

    >>> a = 1.
    >>> b = 2.
    >>> nx = 10
    >>> x = np.arange(10, dtype='float')
    >>> y = a + b * x
    >>> y_unc = numpy.ones(nx)
    >>> y[::2]  += 1
    >>> y[1::2] -= 1
    >>> a, b, sa, sb, chisq, prob, covar, yfit = linfit.linfit(y, x, y_unc)
    >>> print(a, b, sa, sb, chisq, prob, covar, yfit)
(1.272727272727272, 1.9393939393939394, 0.58775381364525869, 0.11009637651263605, 9.6969696969696937, 0.28694204178663996, array([[ 0.34545455, -0.05454545],
       [-0.05454545,  0.01212121]]), array([  1.27272727,   3.21212121,   5.15151515,   7.09090909,
         9.03030303,  10.96969697,  12.90909091,  14.84848485,
        16.78787879,  18.72727273]))

    Revisons
    --------
    2007-09-23 0.1  jh@physics.ucf.edu	Initial version
    2007-09-25 0.2  jh@physics.ucf.edu	Fixed bug reported by Kevin Stevenson.
    2008-10-09 0.3  jh@physics.ucf.edu	Fixed doc bug.
    2009-10-01 0.4  jh@physics.ucf.edu  Updated docstring, imports.
  '''
  # standardize and test inputs
  if x == None:
    x = np.indices(y.length, dtype=y.dtype)
    x.shape = y.shape

  if y_unc == None:
      y_unc = np.ones(y.shape, dtype=y.dtype)
  
  # NR Eq. 15.2.4
  ryu2  = 1. / y_unc**2
  S     = np.sum(1.    * ryu2)
  Sx    = np.sum(x     * ryu2)
  Sy    = np.sum(y     * ryu2)
  # Sxx = np.sum(x**2  * ryu2) # not used in the robust method
  # Sxy = np.sum(x * y * ryu2) # not used in the robust method

  # NR Eq. 15.2.15 - 15.2.18 (i.e., the robust method)
  t = 1. / y_unc * (x - Sx / S)
  Stt = np.sum(t**2)

  b = 1. / Stt * np.sum(t * y / y_unc)
  a = (Sy - Sx * b) / S

  covab = -Sx / (S * Stt)                  # NR Eq. 15.2.21

  sa = np.sqrt(1. / S * (1. - Sx * covab)) # NR Eq. 15.2.19
  sb = np.sqrt(1. / Stt)                   # NR Eq. 15.2.20

  rab = covab / (sa * sb)                  # NR Eq. 15.2.22
  
  covar = np.array([[sa**2, covab],
                    [covab, sb**2]])
  
  yfit = a + b * x
  chisq = np.sum( ((y - yfit) / y_unc)**2 )
  
  prob = 1. - ss.gammainc( (y.size - 2.) / 2., chisq / 2.)
  
  return a, b, sa, sb, chisq, prob, covar, yfit



More information about the SciPy-User mailing list