[SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit

Christoph Deil deil.christoph@googlemail....
Tue Aug 30 15:15:17 CDT 2011


I noticed that scipy.optimize.curve_fit returns parameter errors that don't scale with sigma, the standard deviation of ydata, as I expected.

Here is a code snippet to illustrate my point, which fits a straight line to five data points:
import numpy as np
from scipy.optimize import curve_fit
x = np.arange(5)
y = np.array([1, -2, 1, -2, 1])
sigma = np.array([1,  2, 1,  2, 1])
def f(x, a, b):
    return a + b * x
popt, pcov = curve_fit(f, x, y, p0=(0.42, 0.42), sigma=sigma)
perr = np.sqrt(pcov.diagonal())
print('*** sigma = {0} ***'.format(sigma))
print('popt: {0}'.format(popt))
print('perr: {0}'.format(perr))

I get the following result:
*** sigma = [1 2 1 2 1] ***
popt: [  5.71428536e-01   1.19956213e-08]
perr: [ 0.93867933  0.40391117]

Increasing sigma by a factor of 10,
sigma = 10 * np.array([1,  2, 1,  2, 1])
I get the following result:
*** sigma = [10 20 10 20 10] ***
popt: [  5.71428580e-01  -2.27625699e-09]
perr: [ 0.93895295  0.37079075]

The best-fit values stayed the same as expected.
But the error on the slope b decreased by 8% (the error on the offset a didn't change much)
I would have expected fit parameter errors to increase with increasing errors on the data!?
Is this a bug?

Looking at the source code I see that scipy.optimize.curve_fit multiplies the pcov obtained from scipy.optimize.leastsq by a factor s_sq:
https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack.py#L438
    if (len(ydata) > len(p0)) and pcov is not None:
        s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
        pcov = pcov * s_sq

If so is it possible to add an explanation to
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
that pcov is multiplied with this s_sq factor and why that will give correct errors?

After I noticed this issue I saw that this s_sq factor is mentioned in the cov_x return parameter description of leastsq,
but I think it should be explained in curve_fit where it is applied, maybe leaving a reference in the cov_x leastsq description.

Also it would be nice to mention the full_output option in the curve_fit docu, I only realized after looking at the source code that this was possible.

Christoph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110830/4bc0cfa6/attachment.html 


More information about the SciPy-User mailing list