[SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit
Christoph Deil
deil.christoph@googlemail....
Wed Aug 31 10:09:20 CDT 2011
On Aug 30, 2011, at 11:25 PM, josef.pktd@gmail.com wrote:
> On Tue, Aug 30, 2011 at 4:15 PM, Christoph Deil
> <deil.christoph@googlemail.com> wrote:
>> I noticed that scipy.optimize.curve_fit returns parameter errors that don't
>> scale with sigma, the standard deviation of ydata, as I expected.
>> Here is a code snippet to illustrate my point, which fits a straight line to
>> five data points:
>> import numpy as np
>> from scipy.optimize import curve_fit
>> x = np.arange(5)
>> y = np.array([1, -2, 1, -2, 1])
>> sigma = np.array([1, 2, 1, 2, 1])
>> def f(x, a, b):
>> return a + b * x
>> popt, pcov = curve_fit(f, x, y, p0=(0.42, 0.42), sigma=sigma)
>> perr = np.sqrt(pcov.diagonal())
>> print('*** sigma = {0} ***'.format(sigma))
>> print('popt: {0}'.format(popt))
>> print('perr: {0}'.format(perr))
>> I get the following result:
>> *** sigma = [1 2 1 2 1] ***
>> popt: [ 5.71428536e-01 1.19956213e-08]
>> perr: [ 0.93867933 0.40391117]
>> Increasing sigma by a factor of 10,
>> sigma = 10 * np.array([1, 2, 1, 2, 1])
>> I get the following result:
>> *** sigma = [10 20 10 20 10] ***
>> popt: [ 5.71428580e-01 -2.27625699e-09]
>> perr: [ 0.93895295 0.37079075]
>> The best-fit values stayed the same as expected.
>> But the error on the slope b decreased by 8% (the error on the offset a
>> didn't change much)
>> I would have expected fit parameter errors to increase with increasing
>> errors on the data!?
>> Is this a bug?
>
> No bug in the formulas. I tested all of them when curve_fit was added.
>
> However in your example the numerical cov lacks quite a bit of
> precision. Trying your example with different starting values, I get a
> 0.05 difference in your perr (std of parameter estimates).
>
> Trying smaller xtol and ftol doesn't change anything. (?)
Making ftol = 1e-15 very small I get a different wrong result:
popt: [ 5.71428580e-01 -2.27625699e-09]
perr: [ 0.92582011 0.59868281]
What do I have to do to get a correct answer (say to 5 significant digits) from curve_fit for this simple example?
>
> Since it's linear
>
>>>> import scikits.statsmodels.api as sm
>>>> x = np.arange(5.)
>>>> y = np.array([1, -2, 1, -2, 1.])
>>>> sigma = np.array([1, 2, 1, 2, 1.])
>>>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./sigma**2).fit()
>>>> res.params
> array([ 5.71428571e-01, 1.11022302e-16])
>>>> res.bse
> array([ 0.98609784, 0.38892223])
>
>>>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./(sigma*10)**2).fit()
>>>> res.params
> array([ 5.71428571e-01, 1.94289029e-16])
>>>> res.bse
> array([ 0.98609784, 0.38892223])
>
> rescaling doesn't change parameter estimates nor perr
This is what I don't understand.
Why don't the parameter estimate errors increase with increasing errors sigma on the data points?
If I have less precise measurements, the model parameters should be less constrained?!
I was using MINUIT before I learned Scipy and the error definition for a chi2 fit given in the MINUIT User Guide
http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node7.html
as well as the example results here
http://code.google.com/p/pyminuit/wiki/GettingStartedGuide
don't mention the factor s_sq that is used in curve_fit to scale pcov.
Is the error definition in the MINUIT manual wrong?
Can you point me to a web resource that explains why the s_sq factor needs to be applied to the covariance matrix?
>
> Josef
>
>
PS: I've attached a script to fit the two examples using statsmodels, scipy and minuit (applying the s_sq factor myself).
Here are the results I get (who's right for the first example? why does statsmodels only return on parameter value and error?):
"""Example from http://code.google.com/p/pyminuit/wiki/GettingStartedGuide"""
x = np.array([1 , 2 , 3 , 4 ])
y = np.array([1.1, 2.1, 2.4, 4.3])
sigma = np.array([0.1, 0.1, 0.2, 0.1])
statsmodels.api.WLS
popt: [ 1.04516129]
perr: [ 0.0467711]
scipy.optimize.curve_fit
popt: [ 8.53964011e-08 1.04516128e+00]
perr: [ 0.27452122 0.09784324]
minuit
popt: [-4.851674617611934e-14, 1.0451612903225629]
perr: [ 0.33828315 0.12647671]
"""Example from http://mail.scipy.org/pipermail/scipy-user/2011-August/030412.html"""
x = np.arange(5)
y = np.array([1, -2, 1, -2, 1])
sigma = 10 * np.array([1, 2, 1, 2, 1])
statsmodels.api.WLS
popt: [ 5.71428571e-01 7.63278329e-17]
perr: [ 0.98609784 0.38892223]
scipy.optimize.curve_fit
popt: [ 5.71428662e-01 -8.73679511e-08]
perr: [ 0.97804034 0.3818681 ]
minuit
popt: [0.5714285714294132, 2.1449508835758024e-13]
perr: [ 0.98609784 0.38892223]
>
>
>> Looking at the source code I see that scipy.optimize.curve_fit multiplies
>> the pcov obtained from scipy.optimize.leastsq by a factor s_sq:
>> https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack.py#L438
>>
>> if (len(ydata) > len(p0)) and pcov is not None:
>> s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
>> pcov = pcov * s_sq
>>
>> If so is it possible to add an explanation to
>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
>> that pcov is multiplied with this s_sq factor and why that will give correct
>> errors?
>> After I noticed this issue I saw that this s_sq factor is mentioned in the
>> cov_x return parameter description of leastsq,
>> but I think it should be explained in curve_fit where it is applied, maybe
>> leaving a reference in the cov_x leastsq description.
>>
>> Also it would be nice to mention the full_output option in the curve_fit
>> docu, I only realized after looking at the source code that this was
>> possible.
>> Christoph
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110831/1abe6290/attachment-0002.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chi2_example.py
Type: text/x-python-script
Size: 1802 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20110831/1abe6290/attachment-0001.bin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110831/1abe6290/attachment-0003.html
More information about the SciPy-User
mailing list