[SciPy-User] StdErr Problem with Gary Strangman's linregress function
Nuttall, Brandon C
bnuttall@uky....
Mon Jan 11 14:07:02 CST 2010
OK, I think I've figured it out.
The numpy covariance function doesn't seem to return the actual sample variances (it returns a population variance?). What this means is that for the linregress() function in the stats.py source file, the quantity sterrest is not calculated correctly and needs to be adjusted to the sample variance. In addition, it includes the quantity ssxm, sum of squares for x (?) and I can't find documentation for its inclusion.
# as implemented
# sterrest = np.sqrt((1-r*r)*ssym / ssxm / df)
# should be corrected to
sterrest = np.sqrt((1-r*r)*(ssym*n)/df)
Having made this correction, both the example provided and the example in Crow, Davis, and Maxfield (Table 6.1, p. 154) provide the same value for the standard error of the estimate and the value matches what is calculated by Excel.
I don't know anything about SVN or submitting a correction, so someone will have to help me out or do it for me.
Thanks.
Brandon
Brandon Nuttall, KRPG-1364
Kentucky Geological Survey
www.uky.edu/kgs<http://www.uky.edu/kgs>
bnuttall@uky.edu<mailto:bnuttall@uky.edu> (KGS, Mo-We)
Brandon.nuttall@ky.gov<mailto:Brandon.nuttall@ky.gov> (EEC, Th-Fr)
859-257-5500 ext 30544 (main)
859-323-0544 (direct)
859-684-7473 (cell)
859-257-1147 (FAX)
From: scipy-user-bounces@scipy.org [mailto:scipy-user-bounces@scipy.org] On Behalf Of josef.pktd@gmail.com
Sent: Sunday, January 10, 2010 8:41 PM
To: SciPy Users List
Subject: Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress function
On Sun, Jan 10, 2010 at 8:21 PM, Bruce Southey <bsouthey@gmail.com<mailto:bsouthey@gmail.com>> wrote:
On Sun, Jan 10, 2010 at 3:35 PM, <totalbull@mac.com<mailto:totalbull@mac.com>> wrote:
Hello, Excel and scipy.stats.linregress are disagreeing on the standard error of a regression.
I need to find the standard errors of a bunch of regressions, and prefer to use pure Python than RPy. So I am going to scipy.stats.linregress, as advised at:
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress
from scipy import stats
x = [5.05, 6.75, 3.21, 2.66]
y = [1.65, 26.5, -5.93, 7.96]
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
gradient
5.3935773611970186
intercept
-16.281127993087829
r_value
0.72443514211849758
r_value**2
0.52480627513624778
std_err
3.6290901222878866
The problem is that the std error calculation does not agree with what is returned in Microsoft Excel's STEYX function (whereas all the other output does). From Excel:
[cid:image001.png@01CA92CD.B1C81030]
Anybody knows what's going on? Any alternative way of getting the standard error without going to R?
_______________________________________________
SciPy-User mailing list
SciPy-User@scipy.org<mailto:SciPy-User@scipy.org>
http://mail.scipy.org/mailman/listinfo/scipy-user
The Excel help is rather cryptic by :"Returns the standard error of the predicted y-value for each x in the regression. The standard error is a measure of the amount of error in the prediction of y for an individual x." But clearly this is not the same as the standard error of the 'gradient' (slope) returned by linregress. Without checking the formula, STEYX appears returns the square root what most people call the mean square error (MSE).
Bruce
_______________________________________________
SciPy-User mailing list
SciPy-User@scipy.org<mailto:SciPy-User@scipy.org>
http://mail.scipy.org/mailman/listinfo/scipy-user
>>> gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
>>> ((y-intercept-np.array(x)*gradient)**2).sum()/(4.-2.)
136.80611125682617
>>> np.sqrt(_)
11.6964144615701
I think this should be the estimate of the standard deviation of the noise/error term.
Josef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100111/8a142346/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1973 bytes
Desc: image001.png
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20100111/8a142346/attachment.png
More information about the SciPy-User
mailing list