# [SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit

Thu Sep 1 05:04:21 CDT 2011

```On Sep 1, 2011, at 3:45 AM, josef.pktd@gmail.com wrote:

> On Wed, Aug 31, 2011 at 12:10 PM,  <josef.pktd@gmail.com> wrote:
>> On Wed, Aug 31, 2011 at 11:09 AM, Christoph Deil
>>>
>>> On Aug 30, 2011, at 11:25 PM, josef.pktd@gmail.com wrote:
>>>
>>> On Tue, Aug 30, 2011 at 4:15 PM, Christoph Deil
>>>
>>> I noticed that scipy.optimize.curve_fit returns parameter errors that don't
>>>
>>> scale with sigma, the standard deviation of ydata, as I expected.
>>>
>>> Here is a code snippet to illustrate my point, which fits a straight line to
>>>
>>> five data points:
>>>
>>> import numpy as np
>>>
>>> from scipy.optimize import curve_fit
>>>
>>> x = np.arange(5)
>>>
>>> y = np.array([1, -2, 1, -2, 1])
>>>
>>> sigma = np.array([1,  2, 1,  2, 1])
>>>
>>> def f(x, a, b):
>>>
>>>     return a + b * x
>>>
>>> popt, pcov = curve_fit(f, x, y, p0=(0.42, 0.42), sigma=sigma)
>>>
>>> perr = np.sqrt(pcov.diagonal())
>>>
>>> print('*** sigma = {0} ***'.format(sigma))
>>>
>>> print('popt: {0}'.format(popt))
>>>
>>> print('perr: {0}'.format(perr))
>>>
>>> I get the following result:
>>>
>>> *** sigma = [1 2 1 2 1] ***
>>>
>>> popt: [  5.71428536e-01   1.19956213e-08]
>>>
>>> perr: [ 0.93867933  0.40391117]
>>>
>>> Increasing sigma by a factor of 10,
>>>
>>> sigma = 10 * np.array([1,  2, 1,  2, 1])
>>>
>>> I get the following result:
>>>
>>> *** sigma = [10 20 10 20 10] ***
>>>
>>> popt: [  5.71428580e-01  -2.27625699e-09]
>>>
>>> perr: [ 0.93895295  0.37079075]
>>>
>>> The best-fit values stayed the same as expected.
>>>
>>> But the error on the slope b decreased by 8% (the error on the offset a
>>>
>>> didn't change much)
>>>
>>> I would have expected fit parameter errors to increase with increasing
>>>
>>> errors on the data!?
>>>
>>> Is this a bug?
>>>
>>> No bug in the formulas. I tested all of them when curve_fit was added.
>>>
>>> However in your example the numerical cov lacks quite a bit of
>>> precision. Trying your example with different starting values, I get a
>>> 0.05 difference in your perr (std of parameter estimates).
>>>
>>> Trying smaller xtol and ftol doesn't change anything. (?)
>>>
>>> Making ftol = 1e-15 very small I get a different wrong result:
>>> popt: [  5.71428580e-01  -2.27625699e-09]
>>> perr: [ 0.92582011  0.59868281]
>>> What do I have to do to get a correct answer (say to 5 significant digits)
>>> from curve_fit for this simple example?
>>>
>>> Since it's linear
>>>
>>> import scikits.statsmodels.api as sm
>>>
>>> x = np.arange(5.)
>>>
>>> y = np.array([1, -2, 1, -2, 1.])
>>>
>>> sigma = np.array([1,  2, 1,  2, 1.])
>>>
>>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./sigma**2).fit()
>>>
>>> res.params
>>>
>>> array([  5.71428571e-01,   1.11022302e-16])
>>>
>>> res.bse
>>>
>>> array([ 0.98609784,  0.38892223])
>>>
>>> res = sm.WLS(y, sm.add_constant(x, prepend=True),
>>> weights=1./(sigma*10)**2).fit()
>>>
>>> res.params
>>>
>>> array([  5.71428571e-01,   1.94289029e-16])
>>>
>>> res.bse
>>>
>>> array([ 0.98609784,  0.38892223])
>>>
>>> rescaling doesn't change parameter estimates nor perr
>>>
>>> This is what I don't understand.
>>> Why don't the parameter estimate errors increase with increasing errors
>>> sigma on the data points?
>>> If I have less precise measurements, the model parameters should be less
>>> constrained?!
>>> I was using MINUIT before I learned Scipy and the error definition for a
>>> chi2 fit given in the MINUIT User Guide
>>> http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node7.html
>>> as well as the example results here
>>> don't mention the factor s_sq that is used in curve_fit to scale pcov.
>>> Is the error definition in the MINUIT manual wrong?
>>> Can you point me to a web resource that explains why the s_sq factor needs
>>> to be applied to the covariance matrix?
>>
>> It's standard text book information, but Wikipedia seems to be lacking
>> a bit in this.
>>
>> for the linear case
>> http://en.wikipedia.org/wiki/Ordinary_least_squares#Assuming_normality
>>
>> cov_params = sigma^2 (X'X)^{-1}
>>
>> for the non-linear case with leastsq, X is replaced by Jacobian,
>> otherwise everything is the same.
>>
>> However, in your minuit links I saw only the Hessian mentioned (from
>> very fast skimming the pages)
>>
>> With maximum likelihood, the inverse Hessian is the complete
>> covariance matrix, no additional multiplication is necessary.
>>
>> Essentially, these are implementation details depending on how the
>> estimation is calculated, and there are various ways of numerically
>> approximating the Hessian.
>> That's why this is described for optimize.leastsq (incorrectly as
>> Chuck pointed out) and but not in optimize.curve_fit.
>>
>> With leastsquares are maximum likelihood, rescaling both y and
>> f(x,params) has no effect on the parameter estimates, it's just like
>> changing units of y, meters instead of centimeters.
>>
>> I guess scipy.odr would work differently, since it is splitting up the
>> errors between y and x's, but I never looked at the details.

OK, now I understand, thanks for your explanations.
Roughly speaking, scipy.optimize.curve_fit applies a factor s_sq = chi^2 / ndf to the covariance matrix to account for possibly incorrect overall scale in the y errors "sigma" of the data.

I had simply not seen this factor being applied to chi^2 fits before. E.g. in many physics and astronomy papers, parameter errors from chi^2 fit results are reported without this factor. Also the manual of the fitting package used by most physicists (MINUIT) as well as the statistics textbook I use (Cowan -- Statistical data analysis) don't mention it.

May I therefore suggest to explicitly mention this factor in the scipy.optimize.curve_fit docstring to avoid confusion?

>>
>>
>>>
>>> Josef
>>>
>>>
>>>
>>> PS: I've attached a script to fit the two examples using statsmodels, scipy
>>> and minuit (applying the s_sq factor myself).
>>> Here are the results I get (who's right for the first example? why does
>>> statsmodels only return on parameter value and error?):
>>>     """Example from
>>>     x = np.array([1  , 2  , 3  , 4  ])
>>>     y = np.array([1.1, 2.1, 2.4, 4.3])
>>>     sigma = np.array([0.1, 0.1, 0.2, 0.1])
>>> statsmodels.api.WLS
>>> popt: [ 1.04516129]
>>> perr: [ 0.0467711]
>>> scipy.optimize.curve_fit
>>> popt: [  8.53964011e-08   1.04516128e+00]
>>> perr: [ 0.27452122  0.09784324]
>>
>> that's what I get with example 1 when I run your script,
>> I don't know why you have one params in your case

I'll file a statsmodels issue on github.

>> (full_output threw an exception in curve_fit with scipy.__version__ '0.9.0'

>>
>> statsmodels.api.WLS
>> popt: [ -6.66133815e-16   1.04516129e+00]
>> perr: [ 0.33828314  0.12647671]
>> scipy.optimize.curve_fit
>> popt: [  8.53964011e-08   1.04516128e+00]
>> perr: [ 0.27452122  0.09784324]
>>
>>
>>> minuit
>>> popt: [-4.851674617611934e-14, 1.0451612903225629]
>>> perr: [ 0.33828315  0.12647671]
>
> statsmodels.api.WLS
> popt: [ -4.90926744e-16   1.04516129e+00]
> perr: [ 0.33828314  0.12647671]
> statsmodels NonlinearLS
> popt: [ -3.92166386e-08   1.04516130e+00]
> perr: [ 0.33828314  0.12647671]
>
>
> finally, I got some bugs out of the weights handling, but still not fully tested
>
> def run_nonlinearls():
>    from scikits.statsmodels.miscmodels.nonlinls import NonlinearLS
>
>    class Myfunc(NonlinearLS):
>
>        def _predict(self, params):
>            x = self.exog
>            a, b = params
>            return a + b*x
>
>    mod = Myfunc(y, x, sigma=sigma**2)
>    res = mod.fit(start_value=(0.042, 0.42))
>    print ('statsmodels NonlinearLS')
>    print('popt: {0}'.format(res.params))
>    print('perr: {0}'.format(res.bse))
>

I'm looking forward to using NonlinearLS once it makes it's way in master.

> The basics is the same as curve_fit using leastsq, but it uses complex
> derivatives which are usually numerically very good.
>
> So it looks like the problems with curve_fit in your example are only
> in the numerically derivatives that leastsq is using for the Jacobian.
>
> If leastsq is using only forward differences, then it might be better
> to calculate the final Jacobian with centered differences. just a
> guess.
>
>
>>
>> statsmodels and minuit agree pretty well
>>
>>>     """Example from
>>> http://mail.scipy.org/pipermail/scipy-user/2011-August/030412.html"""
>>>     x = np.arange(5)
>>>     y = np.array([1, -2, 1, -2, 1])
>>>     sigma = 10 * np.array([1,  2, 1,  2, 1])
>>> statsmodels.api.WLS
>>> popt: [  5.71428571e-01   7.63278329e-17]
>>> perr: [ 0.98609784  0.38892223]
>>> scipy.optimize.curve_fit
>>> popt: [  5.71428662e-01  -8.73679511e-08]
>>> perr: [ 0.97804034  0.3818681 ]
>>> minuit
>>> popt: [0.5714285714294132, 2.1449508835758024e-13]
>>> perr: [ 0.98609784  0.38892223]
>
> statsmodels.api.WLS
> popt: [  5.71428571e-01   1.94289029e-16]
> perr: [ 0.98609784  0.38892223]
> statsmodels NonlinearLS
> popt: [  5.71428387e-01   8.45750929e-08]
> perr: [ 0.98609784  0.38892223]
>
> Josef
>
>>
>> statsmodels and minuit agree,
>>
>> my guess is that the jacobian calculation of leastsq (curve_fit) is
>> not very good in these examples. Maybe trying Dfun or the other
>> options, epsfcn, will help.
>>
>> I was trying to see whether I get better results calculation the
>> numerical derivatives in a different way, but had to spend the time
>> fixing bugs.
>> (NonlinearLS didn't work correctly with weights.)
>>
>> Josef
>>
>>>
>>>
>>>
>>>
>>> Looking at the source code I see that scipy.optimize.curve_fit multiplies
>>>
>>> the pcov obtained from scipy.optimize.leastsq by a factor s_sq:
>>>
>>> https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack.py#L438
>>>
>>>     if (len(ydata) > len(p0)) and pcov is not None:
>>>
>>>         s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
>>>
>>>         pcov = pcov * s_sq
>>>
>>> If so is it possible to add an explanation to
>>>
>>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
>>>
>>> that pcov is multiplied with this s_sq factor and why that will give correct
>>>
>>> errors?
>>>
>>> After I noticed this issue I saw that this s_sq factor is mentioned in the
>>>
>>> cov_x return parameter description of leastsq,
>>>
>>> but I think it should be explained in curve_fit where it is applied, maybe
>>>
>>> leaving a reference in the cov_x leastsq description.
>>>
>>> Also it would be nice to mention the full_output option in the curve_fit
>>>
>>> docu, I only realized after looking at the source code that this was
>>>
>>> possible.
>>>
>>> Christoph
>>>
>>> _______________________________________________
>>>
>>> SciPy-User mailing list
>>>
>>> SciPy-User@scipy.org
>>>
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

```