# [SciPy-Dev] On the leastsq/curve_fit method

Bruce Southey bsouthey@gmail....
Tue Sep 27 11:53:52 CDT 2011

On Tue, Sep 27, 2011 at 9:20 AM, <josef.pktd@gmail.com> wrote:

>
>
> On Tue, Sep 27, 2011 at 5:17 AM, Gianfranco Durin <g.durin@inrim.it>wrote:
>
>>
>> > where func can be both "_general_function" and
>> > "_weighted_general_function", is not correct.
>> >
>> >
>> > M = σ 2 I ok unit weights
>> >
>> > M = W^(-1) your case, W has the estimates of the error covariance
>> >
>> > M = σ 2 W^(-1) I think this is what curve_fit uses, and what is in
>> > (my) textbooks defined as weighted least squares
>> >
>> > Do we use definition 2 or 3 by default? both are reasonable
>> >
>> > 3 is what I expected when I looked at curve_fit
>> > 2 might be more useful for two stage estimation, but I didn't have
>> > time to check the details
>>
>> Ehmm, no, curve_fit does not use def 3, as the errors would scale with W,
>> but they don't. By the way, it does not have the correct units.
>>
>
> http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node31.html
>
> '''
> The minimization of [image: $\chi^{2}_{}$] above is sometimes called *weighted
> least  squares* in which case the inverse quantities 1/*e*2 are called the
> weights. Clearly this is simply a different word for the same thing, but in
> practice the use of these words sometimes means that the interpretation of
> *e*2 as variances or squared errors is not straightforward. The word
> weight often implies that only the relative weights are known (point two
> is twice as important as point one'') in which case there is apparently an
> unknown overall normalization factor. Unfortunately the parameter errors
> coming out of such a fit will be proportional to this factor, and the user
> must be aware of this in the formulation of his problem.
> '''
> (I don't quite understand the last sentence.)
>
> M = σ^2 W^(-1), where σ^2 is estimated by residual sum of squares from
> weighted regression.
>
> W only specifies relative errors, the assumption is that the covariance
> matrix of the errors is *proportional* to W. The scaling is arbitrary. If
> the scale of W changes, then the estimated residual sum of squares from
> weighted regression will compensate for it. So, rescaling W has no effect on
> the covariance of the parameter estimates.
>
> I checked in Greene: Econometric Analysis, and briefly looked at the SAS
> description. It looks like weighted least squares is always with automatic
> scaling, W is defined only as relative weights.
>
> All I seem to be able to find is weighted least squares with automatic
> scaling (except for maybe some two-step estimators).
>
>
>>
>> Curve_fit calculates
>>
>> M = W \sigma^2 W^(-1) = \sigma^2
>>
>
> If I remember correctly (calculated from the transformed model) it should
> be:
>
> the cov of the parameter estimates is s^2 (X'WX)^(-1)
> error estimates should be s^2 * W
>
> where W = diag(1/curvefit_sigma**2)   unfortunate terminology for
> curve_fit's sigma or intentional ? (as I mentioned in the other thread)
>
> Josef
>
>
>>
>> so it gives exactly the same results of case 1, irrespective the W's. This
>> is why the errors do not scale with W.
>>
>> Gianfranco
>> _______________________________________________
>>
>
Gianfranco,
Can you please provide some Python and R (or SAS) code to show what you
mean?

In the linear example below, I get agreement between SAS and curve_fit with
and without defining a weight. I do know that the immediate result from
leastsq does not agree with values from SAS. Thus, with my limited
knowledge, I consider that the documentation is correct.

import numpy as np
from scipy.optimize import curve_fit
from scipy.optimize import leastsq
x=np.array([200, 400, 300, 400, 200, 300, 300, 400, 200, 400, 200, 300],
dtype=float)
y=np.array([28, 75, 37, 53, 22, 58, 40, 96, 34, 52, 30, 69], dtype=float)
w=np.array([0.001168822, 0.000205547, 0.000408122, 0.000205547, 0.001168822,
0.000408122, 0.000408122, 0.000205547, 0.001168822, 0.000205547,
0.001168822, 0.000408122])
sw=1/np.sqrt(w)

def linfunc(x, a, b):
return a + b*x
popt, pcov = curve_fit(linfunc, x, y)
wopt, wcov = curve_fit(linfunc, x, y, sigma=sw)

Output:
No weighted form usage of curve_fit which matches SAS:
a= -11.25 with SE= 15.88585587
b=0.2025 with SE=0.05109428

Weighted form if I understand the difference between how weights need to be
specified so the results match SAS:
a= -12.91438  with SE= 11.09325
b=0.20831 with SE= 0.04342

Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20110927/1c905b9f/attachment.html