[SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit

Christoph Deil deil.christoph@googlemail....
Wed Aug 31 10:09:20 CDT 2011


On Aug 30, 2011, at 11:25 PM, josef.pktd@gmail.com wrote:

> On Tue, Aug 30, 2011 at 4:15 PM, Christoph Deil
> <deil.christoph@googlemail.com> wrote:
>> I noticed that scipy.optimize.curve_fit returns parameter errors that don't
>> scale with sigma, the standard deviation of ydata, as I expected.
>> Here is a code snippet to illustrate my point, which fits a straight line to
>> five data points:
>> import numpy as np
>> from scipy.optimize import curve_fit
>> x = np.arange(5)
>> y = np.array([1, -2, 1, -2, 1])
>> sigma = np.array([1,  2, 1,  2, 1])
>> def f(x, a, b):
>>     return a + b * x
>> popt, pcov = curve_fit(f, x, y, p0=(0.42, 0.42), sigma=sigma)
>> perr = np.sqrt(pcov.diagonal())
>> print('*** sigma = {0} ***'.format(sigma))
>> print('popt: {0}'.format(popt))
>> print('perr: {0}'.format(perr))
>> I get the following result:
>> *** sigma = [1 2 1 2 1] ***
>> popt: [  5.71428536e-01   1.19956213e-08]
>> perr: [ 0.93867933  0.40391117]
>> Increasing sigma by a factor of 10,
>> sigma = 10 * np.array([1,  2, 1,  2, 1])
>> I get the following result:
>> *** sigma = [10 20 10 20 10] ***
>> popt: [  5.71428580e-01  -2.27625699e-09]
>> perr: [ 0.93895295  0.37079075]
>> The best-fit values stayed the same as expected.
>> But the error on the slope b decreased by 8% (the error on the offset a
>> didn't change much)
>> I would have expected fit parameter errors to increase with increasing
>> errors on the data!?
>> Is this a bug?
> 
> No bug in the formulas. I tested all of them when curve_fit was added.
> 
> However in your example the numerical cov lacks quite a bit of
> precision. Trying your example with different starting values, I get a
> 0.05 difference in your perr (std of parameter estimates).
> 
> Trying smaller xtol and ftol doesn't change anything. (?)

Making ftol = 1e-15 very small I get a different wrong result:
popt: [  5.71428580e-01  -2.27625699e-09]
perr: [ 0.92582011  0.59868281]

What do I have to do to get a correct answer (say to 5 significant digits) from curve_fit for this simple example?

> 
> Since it's linear
> 
>>>> import scikits.statsmodels.api as sm
>>>> x = np.arange(5.)
>>>> y = np.array([1, -2, 1, -2, 1.])
>>>> sigma = np.array([1,  2, 1,  2, 1.])
>>>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./sigma**2).fit()
>>>> res.params
> array([  5.71428571e-01,   1.11022302e-16])
>>>> res.bse
> array([ 0.98609784,  0.38892223])
> 
>>>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./(sigma*10)**2).fit()
>>>> res.params
> array([  5.71428571e-01,   1.94289029e-16])
>>>> res.bse
> array([ 0.98609784,  0.38892223])
> 
> rescaling doesn't change parameter estimates nor perr

This is what I don't understand.
Why don't the parameter estimate errors increase with increasing errors sigma on the data points?
If I have less precise measurements, the model parameters should be less constrained?! 

I was using MINUIT before I learned Scipy and the error definition for a chi2 fit given in the MINUIT User Guide
http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node7.html
as well as the example results here
http://code.google.com/p/pyminuit/wiki/GettingStartedGuide
don't mention the factor s_sq that is used in curve_fit to scale pcov.

Is the error definition in the MINUIT manual wrong?
Can you point me to a web resource that explains why the s_sq factor needs to be applied to the covariance matrix?

> 
> Josef
> 
> 

PS: I've attached a script to fit the two examples using statsmodels, scipy and minuit (applying the s_sq factor myself).

Here are the results I get (who's right for the first example? why does statsmodels only return on parameter value and error?):

    """Example from http://code.google.com/p/pyminuit/wiki/GettingStartedGuide"""
    x = np.array([1  , 2  , 3  , 4  ])
    y = np.array([1.1, 2.1, 2.4, 4.3])
    sigma = np.array([0.1, 0.1, 0.2, 0.1])

statsmodels.api.WLS
popt: [ 1.04516129]
perr: [ 0.0467711]
scipy.optimize.curve_fit
popt: [  8.53964011e-08   1.04516128e+00]
perr: [ 0.27452122  0.09784324]
minuit
popt: [-4.851674617611934e-14, 1.0451612903225629]
perr: [ 0.33828315  0.12647671]

    """Example from http://mail.scipy.org/pipermail/scipy-user/2011-August/030412.html"""
    x = np.arange(5)
    y = np.array([1, -2, 1, -2, 1])
    sigma = 10 * np.array([1,  2, 1,  2, 1])

statsmodels.api.WLS
popt: [  5.71428571e-01   7.63278329e-17]
perr: [ 0.98609784  0.38892223]
scipy.optimize.curve_fit
popt: [  5.71428662e-01  -8.73679511e-08]
perr: [ 0.97804034  0.3818681 ]
minuit
popt: [0.5714285714294132, 2.1449508835758024e-13]
perr: [ 0.98609784  0.38892223]



> 
> 
>> Looking at the source code I see that scipy.optimize.curve_fit multiplies
>> the pcov obtained from scipy.optimize.leastsq by a factor s_sq:
>> https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack.py#L438
>> 
>>     if (len(ydata) > len(p0)) and pcov is not None:
>>         s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
>>         pcov = pcov * s_sq
>> 
>> If so is it possible to add an explanation to
>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
>> that pcov is multiplied with this s_sq factor and why that will give correct
>> errors?
>> After I noticed this issue I saw that this s_sq factor is mentioned in the
>> cov_x return parameter description of leastsq,
>> but I think it should be explained in curve_fit where it is applied, maybe
>> leaving a reference in the cov_x leastsq description.
>> 
>> Also it would be nice to mention the full_output option in the curve_fit
>> docu, I only realized after looking at the source code that this was
>> possible.
>> Christoph
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> 
>> 
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110831/1abe6290/attachment-0002.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chi2_example.py
Type: text/x-python-script
Size: 1802 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20110831/1abe6290/attachment-0001.bin 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110831/1abe6290/attachment-0003.html 


More information about the SciPy-User mailing list