# [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit

Bruce Southey bsouthey@gmail....
Fri Sep 2 08:59:56 CDT 2011

```On 09/02/2011 01:43 AM, Matt Newville wrote:
> Hi Cristoph,
>
> On Thu, Sep 1, 2011 at 4:29 PM, Christoph Deil
>> On Sep 1, 2011, at 9:45 PM, Matt Newville wrote:
>>
>>> Christoph,
>>>
>>> I ran some tests with the scaling of the covariance matrix from
>>> scipy.optimize.leastsq.  Using lmfit-py, and scipy.optimize.leastsq, I
>>> did fits with sample function of a Gaussian + Line + Simulated Noise
>>> (random.normal(scale=0.023).  The "data" had 201 points, and the fit
>>> had 5 variables.  I ran fits with covariance scaling on and off, and
>>> "data sigma" = 0.1, 0.2, and 0.5.  The full results are below, and the
>>> code to run this is
>>>
>>>    https://github.com/newville/lmfit-py/blob/master/tests/test_covar.py
>>>
>>> You'll need the latest git version of lmfit-py for this to be able to
>>> turn on/off the covariance scaling.
>>>
>>> As you can see from the results below, by scaling the covariance, the
>>> estimated uncertainties in the parameters are independent of sigma,
>>> the estimated uncertainty in the data.  In many cases this is a
>>> fabulously convenient feature.
>>>
>>> As expected, when the covariance matrix is not scaled by sum_sqr /
>>> nfree, the estimated uncertainties in the variables depends greatly on
>>> the value of sigma.  For the "correct" sigma of 0.23 in this one test
>>> case, the scaled and unscaled values are very close:
>>>
>>>    amp_g = 20.99201 +/- 0.05953 (unscaled) vs +/- 0.05423 (scaled) [True=21.0]
>>>    cen_g =  8.09857 +/- 0.00435 (unscaled) vs +/- 0.00396 (scaled) [True= 8.1]
>>>
>>> and so on.  The scaled uncertainties appear to be about 10% larger
>>> than the unscaled ones.  Since this was just one set of data (ie, one
>>> set of simulated noise), I'm not sure whether this difference is
>>> significant or important to you.  Interestingly, these two cases are
>>> in better agreement than comparing sigma=0.20 and sigma=0.23 for the
>>> unscaled covariance matrix.
>>>
>>> For myself, I much prefer having estimated uncertainties that may be
>>> off by 10% than being expected to know the uncertainties in the data
>>> to 10%.  But then, I work in a field where we have lots of data and
>>> systematic errors in collection and processing swamp any simple
>>> estimate of the noise in the data.
>>>
>>> As a result, I think the scaling should stay the way it is.
>> I think there are use cases for scaling and for not scaling.
>> Adding an option to scipy.optimize.curve_fit, as you did for lmfit is a nice solution.
>> Returning the scaling factor s = chi2 / ndf (or chi2 and ndf independently) would be another option to let the user decide what she wants.
>>
>> The numbers you give in your example are small because your chi2 / ndf is approximately one, so your scaling factor is approximately one.
> Ah, right.  I see your point.  Scaling the covariance matrix is
> equivalent to asserting that the fit is good (reduced chi_square = 1),
> and so scaling sigma such that this is the case, and getting the
> parameter uncertainties accordingly.  This, and the fact that reduced
> chi_square was slightly less than 1 (0.83) in the example I gave,
> explains the ~10% difference in uncertainties.  But again, that's an
> idealized case.
>
>> If the model doesn't represent the data well, then chi2 / ndf is larger than one and the differences in estimated parameter errors become larger.
> I think the  question is: should the estimated uncertainties reflect
> the imperfection of the model?  I can see merit in both methods (which
> differ by a factor of sqrt(reduced_chi_square)):
>
>     a) Given an estimate of sigma, estimate the uncertainties.  Use
> unscaled covariance.
>     b) Assert that this is a "good fit",  estimate the uncertainties.
> Use scaled covariance.
>
> That is, one might have a partial estimate of sigma, and use reduced
> chi_square maritally to assess how good this is.
>
>> IMO if the user does give sigmas to curve_fit, it means that she has reason to believe that these are the errors on the data points
>> and thus the default should be to not apply the scaling factor in that case.
>> On the other hand at the moment the scaling factor is always applied, so having a keyword option
>> scale_covariance=True as default means backwards compatibility.
> I believe you're proposing that default behavior should be "if sigma
> is given, use method a, otherwise use method b".  I think that's
> reasonable, but don't have a strong opinion.  I'm not sure I see a
> case for changing curve_fit, but I'm not committed to the behavior of
> lmfit-py.  Perhaps the best thing to do would be to leave covariance
> matrix unscaled, but scale the estimated uncertainties as you propose.
>
> Cheers,
>
> --Matt Newville<newville at cars.uchicago.edu>  630-252-0431
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
I think there is a confusion about the methodology.

Josef provided the normal equations but as he indicated, those come from
the assumption of independently and identically distributed
residuals/errors (assumption of normality is only relevant for
hypothesis testing). It is more instructive to use the generalized least
squares (http://en.wikipedia.org/wiki/Generalized_least_squares) because
that essentially avoids that assumption.

X'*V^{-1}*X*b = X'*V^{-1}*Y

Where V is the variance-covariance matrix of the residuals. For example,
ordinary least square treats it as identity matrix * residual variance
so the whole thing 'collapses' to:
X'*X*b = X'*Y

the residual variance. Most programs assume some structure matrix times
a common variance. To me the 'sigma' argument of
scipy.optimize.curve_fit doc string appears to be the same as supplying
a weight statement in weighted least squares. So I would not expect that
you get exact same solutions between your two approaches.

I do not know what you are doing in lmfit-py or by 'scaling', but it
seem like you have just scaled the data by a value close to the
estimated residual variance (from the standard errors provided,
0.00435/0.00396=1.1).  That is you are just providing a common weight to
all observations of about 1.1 (or 1.1*1.1 depending how the weight is
being used as some program want to squared or the inverse). But you are
forgetting that common weight in your comparison.

Consequently you have to be able to test the variance covariance
structures (see mixed models for more details).

Bruce
```