# [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit (Christoph Deil)

Matt Newville newville@cars.uchicago....
Thu Sep 1 14:45:31 CDT 2011

Christoph,

I ran some tests with the scaling of the covariance matrix from
scipy.optimize.leastsq.  Using lmfit-py, and scipy.optimize.leastsq, I
did fits with sample function of a Gaussian + Line + Simulated Noise
(random.normal(scale=0.023).  The "data" had 201 points, and the fit
had 5 variables.  I ran fits with covariance scaling on and off, and
"data sigma" = 0.1, 0.2, and 0.5.  The full results are below, and the
code to run this is

https://github.com/newville/lmfit-py/blob/master/tests/test_covar.py

You'll need the latest git version of lmfit-py for this to be able to
turn on/off the covariance scaling.

As you can see from the results below, by scaling the covariance, the
estimated uncertainties in the parameters are independent of sigma,
the estimated uncertainty in the data.  In many cases this is a
fabulously convenient feature.

As expected, when the covariance matrix is not scaled by sum_sqr /
nfree, the estimated uncertainties in the variables depends greatly on
the value of sigma.  For the "correct" sigma of 0.23 in this one test
case, the scaled and unscaled values are very close:

amp_g = 20.99201 +/- 0.05953 (unscaled) vs +/- 0.05423 (scaled) [True=21.0]
cen_g =  8.09857 +/- 0.00435 (unscaled) vs +/- 0.00396 (scaled) [True= 8.1]

and so on.  The scaled uncertainties appear to be about 10% larger
than the unscaled ones.  Since this was just one set of data (ie, one
set of simulated noise), I'm not sure whether this difference is
significant or important to you.  Interestingly, these two cases are
in better agreement than comparing sigma=0.20 and sigma=0.23 for the
unscaled covariance matrix.

For myself, I much prefer having estimated uncertainties that may be
off by 10% than being expected to know the uncertainties in the data
to 10%.  But then, I work in a field where we have lots of data and
systematic errors in collection and processing swamp any simple
estimate of the noise in the data.

As a result, I think the scaling should stay the way it is.

Cheers,

--Matt Newville <newville at cars.uchicago.edu> 630-252-0431

====  scale_covar =  True  ===
sigma          =  0.1
chisqr         =  860.573615001
reduced_chisqr =  4.39068170919
amp_g:          20.99201+/- 0.05423 (inital=  10.000000,
model_value= 21.000000)
cen_g:          8.09857+/- 0.00396 (inital=  9.000000, model_value= 8.100000)
line_off:      -0.99346+/- 0.03478 (inital=  0.000000, model_value=-1.023000)
line_slope:     0.61737+/- 0.00266 (inital=  0.000000, model_value= 0.620000)
wid_g:          1.59679+/- 0.00508 (inital=  1.000000, model_value= 1.600000)
==============================
sigma          =  0.2
chisqr         =  215.14340375
reduced_chisqr =  1.0976704273
amp_g:          20.99201+/- 0.05423 (inital=  10.000000,
model_value= 21.000000)
cen_g:          8.09857+/- 0.00396 (inital=  9.000000, model_value= 8.100000)
line_off:      -0.99346+/- 0.03478 (inital=  0.000000, model_value=-1.023000)
line_slope:     0.61737+/- 0.00266 (inital=  0.000000, model_value= 0.620000)
wid_g:          1.59679+/- 0.00508 (inital=  1.000000, model_value= 1.600000)
==============================
sigma          =  0.23
chisqr         =  162.679322306
reduced_chisqr =  0.82999654238
amp_g:          20.99201+/- 0.05423 (inital=  10.000000,
model_value= 21.000000)
cen_g:          8.09857+/- 0.00396 (inital=  9.000000, model_value= 8.100000)
line_off:      -0.99346+/- 0.03478 (inital=  0.000000, model_value=-1.023000)
line_slope:     0.61737+/- 0.00266 (inital=  0.000000, model_value= 0.620000)
wid_g:          1.59679+/- 0.00508 (inital=  1.000000, model_value= 1.600000)
==============================
sigma          =  0.5
chisqr         =  34.4229446
reduced_chisqr =  0.175627268368
amp_g:          20.99201+/- 0.05423 (inital=  10.000000,
model_value= 21.000000)
cen_g:          8.09857+/- 0.00396 (inital=  9.000000, model_value= 8.100000)
line_off:      -0.99346+/- 0.03478 (inital=  0.000000, model_value=-1.023000)
line_slope:     0.61737+/- 0.00266 (inital=  0.000000, model_value= 0.620000)
wid_g:          1.59679+/- 0.00508 (inital=  1.000000, model_value= 1.600000)
==============================
====  scale_covar =  False  ===
sigma          =  0.1
chisqr         =  860.573615001
reduced_chisqr =  4.39068170919
amp_g:          20.99201+/- 0.02588 (inital=  10.000000,
model_value= 21.000000)
cen_g:          8.09857+/- 0.00189 (inital=  9.000000, model_value= 8.100000)
line_off:      -0.99346+/- 0.01660 (inital=  0.000000, model_value=-1.023000)
line_slope:     0.61737+/- 0.00127 (inital=  0.000000, model_value= 0.620000)
wid_g:          1.59679+/- 0.00242 (inital=  1.000000, model_value= 1.600000)
==============================
sigma          =  0.2
chisqr         =  215.14340375
reduced_chisqr =  1.0976704273
amp_g:          20.99201+/- 0.05177 (inital=  10.000000,
model_value= 21.000000)
cen_g:          8.09857+/- 0.00378 (inital=  9.000000, model_value= 8.100000)
line_off:      -0.99346+/- 0.03320 (inital=  0.000000, model_value=-1.023000)
line_slope:     0.61737+/- 0.00254 (inital=  0.000000, model_value= 0.620000)
wid_g:          1.59679+/- 0.00485 (inital=  1.000000, model_value= 1.600000)
==============================
sigma          =  0.23
chisqr         =  162.679322306
reduced_chisqr =  0.82999654238
amp_g:          20.99201+/- 0.05953 (inital=  10.000000,
model_value= 21.000000)
cen_g:          8.09857+/- 0.00435 (inital=  9.000000, model_value= 8.100000)
line_off:      -0.99346+/- 0.03818 (inital=  0.000000, model_value=-1.023000)
line_slope:     0.61737+/- 0.00292 (inital=  0.000000, model_value= 0.620000)
wid_g:          1.59679+/- 0.00558 (inital=  1.000000, model_value= 1.600000)
==============================
sigma          =  0.5
chisqr         =  34.4229446
reduced_chisqr =  0.175627268368
amp_g:          20.99201+/- 0.12941 (inital=  10.000000,
model_value= 21.000000)
cen_g:          8.09857+/- 0.00945 (inital=  9.000000, model_value= 8.100000)
line_off:      -0.99346+/- 0.08300 (inital=  0.000000, model_value=-1.023000)
line_slope:     0.61737+/- 0.00636 (inital=  0.000000, model_value= 0.620000)
wid_g:          1.59679+/- 0.01212 (inital=  1.000000, model_value= 1.600000)
==============================