[SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit

josef.pktd@gmai... josef.pktd@gmai...
Tue Aug 30 16:25:21 CDT 2011


On Tue, Aug 30, 2011 at 4:15 PM, Christoph Deil
<deil.christoph@googlemail.com> wrote:
> I noticed that scipy.optimize.curve_fit returns parameter errors that don't
> scale with sigma, the standard deviation of ydata, as I expected.
> Here is a code snippet to illustrate my point, which fits a straight line to
> five data points:
> import numpy as np
> from scipy.optimize import curve_fit
> x = np.arange(5)
> y = np.array([1, -2, 1, -2, 1])
> sigma = np.array([1,  2, 1,  2, 1])
> def f(x, a, b):
>     return a + b * x
> popt, pcov = curve_fit(f, x, y, p0=(0.42, 0.42), sigma=sigma)
> perr = np.sqrt(pcov.diagonal())
> print('*** sigma = {0} ***'.format(sigma))
> print('popt: {0}'.format(popt))
> print('perr: {0}'.format(perr))
> I get the following result:
> *** sigma = [1 2 1 2 1] ***
> popt: [  5.71428536e-01   1.19956213e-08]
> perr: [ 0.93867933  0.40391117]
> Increasing sigma by a factor of 10,
> sigma = 10 * np.array([1,  2, 1,  2, 1])
> I get the following result:
> *** sigma = [10 20 10 20 10] ***
> popt: [  5.71428580e-01  -2.27625699e-09]
> perr: [ 0.93895295  0.37079075]
> The best-fit values stayed the same as expected.
> But the error on the slope b decreased by 8% (the error on the offset a
> didn't change much)
> I would have expected fit parameter errors to increase with increasing
> errors on the data!?
> Is this a bug?

No bug in the formulas. I tested all of them when curve_fit was added.

However in your example the numerical cov lacks quite a bit of
precision. Trying your example with different starting values, I get a
0.05 difference in your perr (std of parameter estimates).

Trying smaller xtol and ftol doesn't change anything. (?)

Since it's linear

>>> import scikits.statsmodels.api as sm
>>> x = np.arange(5.)
>>> y = np.array([1, -2, 1, -2, 1.])
>>> sigma = np.array([1,  2, 1,  2, 1.])
>>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./sigma**2).fit()
>>> res.params
array([  5.71428571e-01,   1.11022302e-16])
>>> res.bse
array([ 0.98609784,  0.38892223])

>>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./(sigma*10)**2).fit()
>>> res.params
array([  5.71428571e-01,   1.94289029e-16])
>>> res.bse
array([ 0.98609784,  0.38892223])

rescaling doesn't change parameter estimates nor perr

Josef




> Looking at the source code I see that scipy.optimize.curve_fit multiplies
> the pcov obtained from scipy.optimize.leastsq by a factor s_sq:
> https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack.py#L438
>
>     if (len(ydata) > len(p0)) and pcov is not None:
>         s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
>         pcov = pcov * s_sq
>
> If so is it possible to add an explanation to
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
> that pcov is multiplied with this s_sq factor and why that will give correct
> errors?
> After I noticed this issue I saw that this s_sq factor is mentioned in the
> cov_x return parameter description of leastsq,
> but I think it should be explained in curve_fit where it is applied, maybe
> leaving a reference in the cov_x leastsq description.
>
> Also it would be nice to mention the full_output option in the curve_fit
> docu, I only realized after looking at the source code that this was
> possible.
> Christoph
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


More information about the SciPy-User mailing list