[SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit

Charles R Harris charlesr.harris@gmail....
Tue Aug 30 22:19:36 CDT 2011


On Tue, Aug 30, 2011 at 2:15 PM, Christoph Deil <
deil.christoph@googlemail.com> wrote:

> I noticed that scipy.optimize.curve_fit returns parameter errors that don't
> scale with sigma, the standard deviation of ydata, as I expected.
>
> Here is a code snippet to illustrate my point, which fits a straight line
> to five data points:
> import numpy as np
> from scipy.optimize import curve_fit
> x = np.arange(5)
> y = np.array([1, -2, 1, -2, 1])
> sigma = np.array([1,  2, 1,  2, 1])
> def f(x, a, b):
>     return a + b * x
> popt, pcov = curve_fit(f, x, y, p0=(0.42, 0.42), sigma=sigma)
> perr = np.sqrt(pcov.diagonal())
> print('*** sigma = {0} ***'.format(sigma))
> print('popt: {0}'.format(popt))
> print('perr: {0}'.format(perr))
>
> I get the following result:
> *** sigma = [1 2 1 2 1] ***
> popt: [  5.71428536e-01   1.19956213e-08]
> perr: [ 0.93867933  0.40391117]
>
> Increasing sigma by a factor of 10,
> sigma = 10 * np.array([1,  2, 1,  2, 1])
> I get the following result:
> *** sigma = [10 20 10 20 10] ***
> popt: [  5.71428580e-01  -2.27625699e-09]
> perr: [ 0.93895295  0.37079075]
>
> The best-fit values stayed the same as expected.
> But the error on the slope b decreased by 8% (the error on the offset a
> didn't change much)
> I would have expected fit parameter errors to increase with increasing
> errors on the data!?
> Is this a bug?
>
> Looking at the source code I see that scipy.optimize.curve_fit multiplies
> the pcov obtained from scipy.optimize.leastsq by a factor s_sq:
> https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack.py#L438
>
>     if (len(ydata) > len(p0)) and pcov is not None:
>         s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
>         pcov = pcov * s_sq
>
> If so is it possible to add an explanation to
>
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
> that pcov is multiplied with this s_sq factor and why that will give
> correct errors?
>
> After I noticed this issue I saw that this s_sq factor is mentioned in the
> cov_x return parameter description of leastsq,
> but I think it should be explained in curve_fit where it is applied, maybe
> leaving a reference in the cov_x leastsq description.
>
> Also it would be nice to mention the full_output option in the curve_fit
> docu, I only realized after looking at the source code that this was
> possible.
>
>
Five points, minus two parameters, doesn't give you much accuracy in
estimating the variance, look at the \Chi^2
distribution<http://en.wikipedia.org/wiki/Chi-square_distribution>for
three degrees of freedom. Generally, you would like a few hundred
points
for this sort of thing.

Note that the leastsq documentation about the cov is incorrect, it needs to
be multiplied by the variance fo the residuals, not the standard deviation.

Not to say that there isn't a bug here, just that the evidence is thin.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110830/a0805f27/attachment.html 


More information about the SciPy-User mailing list