[SciPy-user] linear regression

josef.pktd@gmai... josef.pktd@gmai...
Wed May 27 20:39:29 CDT 2009


On Wed, May 27, 2009 at 9:03 PM, Robert Kern <robert.kern@gmail.com> wrote:
> On Wed, May 27, 2009 at 19:40,  <josef.pktd@gmail.com> wrote:
>
>> The variance of the error term in the regression equation is a linear
>> combination of the true error (p_sy) and the measurement error in x
>> (p_sx)
>>
>> So the correct weighting would be according to  p_sy**2 + beta**2 *
>> p_sx**2, which is in practice not possible since we don't know beta,
>> or maybe iterative would work  (at least something like this should be
>> correct)
>
> Yes! This is precisely what ODR does for you in the linear case, all
> in one shot.

But if I have another instrument/measurement for x, I know how to use
Instrumental Variables regression, and I can remove bias causing
correlation between x and the regression error.

I don't doubt it's ability to handle some cases very well, what I
meant more was that as an estimation framework it hasn't caught on.
There is a huge literature on least squares and, for this case more
appropriate, generalized methods of moments and except for a chapter
in a econometrics textbook, I haven't seen much on orthogonal least
squares. And my impression is, that this is, because the method takes
care of the measurement errors (semi)automatically, it requires less
explicit modeling of the error structure, and is also more limited in
incorporating additional information on the stochastic properties of
the regressors.

Using a simple 2 step iteration for estimating the weights, I get
almost the same mean squared error as odr, but the bias stays higher,
which I don't understand.

Josef


       # Weighted OLS:
       p_su = np.sqrt((beta_true[0] * p_sx)**2 + p_sy**2)
       A = np.ones((len(x), 2))
       A[:,0] = x
       b, res, rank, s = np.linalg.lstsq(A, y)
       p_su = np.sqrt((b[0] * p_sx)**2 + p_sy**2)

       A2 = A/p_su[:,np.newaxis]
       b, res, rank, s = np.linalg.lstsq(A2, y/p_su)
       ols_betas.append(b)

----------
bodr,bols = random_betas(10000)
print "Estimate"
print "odr", bodr.mean(1)
print "ols", bols.mean(1)
print "Bias"
print "odr", bodr.mean(1)-beta_true
print "ols", bols.mean(1)-beta_true
print "MSE"
print "odr", ((bodr-beta_true[:,np.newaxis])**2).mean(1)
print "ols", ((bols-beta_true[:,np.newaxis])**2).mean(1)


Estimate
odr [-0.48114078  5.48110708]
ols [-0.46611397  5.40642845]
Bias
odr [-0.0015325   0.00436683]
ols [ 0.01349432 -0.07031179]
MSE
odr [ 0.00328421  0.08457911]
ols [ 0.0035286   0.09229023]


More information about the SciPy-user mailing list