[SciPy-User] StdErr Problem with Gary Strangman's linregress function

Nuttall, Brandon C bnuttall@uky....
Mon Jan 11 09:47:44 CST 2010


For what it's worth, using by the definition of standard error of the estimate in Crow, Davis, and Maxfield, 1960, Statistics Manual: Dover Publications (p. 156), the Excel function provides the "correct" standard error of the estimate.  Using notation from Crow, Davis, and Maxfield:

import numpy as np
n = 4.0
x = np.array([5.05, 6.75, 3.21, 2.66])
y = np.array([1.65, 26.5, -5.93, 7.96])
x2 = x*x
y2 = y*y
s2x = (4.0*x2.sum()-x.sum()*x.sum())/(n*(n-1.0))
s2y = (4.0*y2.sum()-y.sum()*y.sum())/(n*(n-1.0))
xy = x * y
b = (4.0*xy.sum()-x.sum()*y.sum())/(4.0*x2.sum()-x.sum()*x.sum())
a = (y.sum()-b*x.sum())/n
s2xy = ((n-1.0)/(n-2.0))*(s2y-b*b*s2x)
ste = np.sqrt(s2xy)
r=b*np.sqrt(s2x)/np.sqrt(s2y)
print "intercept: ",a
print "gradient (slope): ",b
print "correlation coefficient, r: ",r
print "std err est: ",ste

Produces the output :

intercept:  -16.2811279931
gradient (slope):  5.3935773612
correlation coefficient, r:  0.724435142118
std err est:  11.6964144616

This same value for the standard error of the estimate is reported with the sample x,y data at the VassarStats, Statistical Computation Web Site, http://faculty.vassar.edu/lowry/VassarStats.html.

Brandon Nuttall, KRPG-1364
Kentucky Geological Survey
www.uky.edu/kgs<http://www.uky.edu/kgs>
bnuttall@uky.edu<mailto:bnuttall@uky.edu> (KGS, Mo-We)
Brandon.nuttall@ky.gov<mailto:Brandon.nuttall@ky.gov> (EEC, Th-Fr)
859-257-5500 ext 30544 (main)
859-323-0544 (direct)
859-684-7473 (cell)
859-257-1147 (FAX)

From: scipy-user-bounces@scipy.org [mailto:scipy-user-bounces@scipy.org] On Behalf Of Bruce Southey
Sent: Sunday, January 10, 2010 8:21 PM
To: SciPy Users List
Subject: Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress function


On Sun, Jan 10, 2010 at 3:35 PM, <totalbull@mac.com<mailto:totalbull@mac.com>> wrote:

Hello, Excel and scipy.stats.linregress are disagreeing on the standard error of a regression.

I need to find the standard errors of a bunch of regressions, and prefer to use pure Python than RPy. So I am going to scipy.stats.linregress, as advised at:
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress


from scipy import stats
x = [5.05, 6.75, 3.21, 2.66]
y = [1.65, 26.5, -5.93, 7.96]
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
gradient
5.3935773611970186

intercept
-16.281127993087829

r_value
0.72443514211849758

r_value**2
0.52480627513624778

std_err
3.6290901222878866


The problem is that the std error calculation does not agree with what is returned in Microsoft Excel's STEYX function (whereas all the other output does). From Excel:

[cid:image001.png@01CA92A7.C1C66980]


Anybody knows what's going on? Any alternative way of getting the standard error without going to R?



_______________________________________________
SciPy-User mailing list
SciPy-User@scipy.org<mailto:SciPy-User@scipy.org>
http://mail.scipy.org/mailman/listinfo/scipy-user

The Excel help is rather cryptic by   :"Returns the standard error of the predicted y-value for each x in the regression. The standard error is a measure of the amount of error in the prediction of y for an individual x." But clearly this is not the same as the standard error of the 'gradient' (slope) returned by linregress. Without checking the formula, STEYX appears returns the square root what most people call the mean square error (MSE).

Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100111/635a294c/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1973 bytes
Desc: image001.png
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20100111/635a294c/attachment.png 


More information about the SciPy-User mailing list