[SciPy-User] StdErr Problem with Gary Strangman's linregress function
totalbull@ma...
totalbull@ma...
Mon Jan 11 15:34:19 CST 2010
not a problem Josef. The new output was luckily wildly different enough so that "that something had happened" was easy to spot. Again thanks for all the help.
Tom
On 11 Jan 2010, at 21:19, josef.pktd@gmail.com wrote:
> On Mon, Jan 11, 2010 at 3:08 PM, <totalbull@mac.com> wrote:
>> Thanks very much to all who have helped with this.
>> I am going to go with the first-principles formulae as per below.
>> Otherwise I also asked on Stack Overflow and one person answered with a
>> scikits example:
>> http://stackoverflow.com/questions/2038667/scipy-linregress-function-erroneous-standard-error-return
>
> If the old version of linregress matched excel, as you say, then I
> unintentionally changed the meaning of this value in response to a
> previous bug report (see http://projects.scipy.org/scipy/ticket/874 )
>
> It's sometimes difficult to figure out what a value is supposed to be,
> if there are neither sufficient documentation nor tests for it. I had
> the numbers of linregress verified against statsmodels, but the
> standard error just means something different than the definition in
> excel.
>
> But as Skipper said, for all but the simplest regression case,
> scikits.statsmodels is much more general and produces more results.
>
> Josef
>
>
>
>>
>> On 11 Jan 2010, at 15:47, Nuttall, Brandon C wrote:
>>
>> For what it’s worth, using by the definition of standard error of the
>> estimate in Crow, Davis, and Maxfield, 1960, Statistics Manual: Dover
>> Publications (p. 156), the Excel function provides the “correct” standard
>> error of the estimate. Using notation from Crow, Davis, and Maxfield:
>>
>> import numpy as np
>> n = 4.0
>> x = np.array([5.05, 6.75, 3.21, 2.66])
>> y = np.array([1.65, 26.5, -5.93, 7.96])
>> x2 = x*x
>> y2 = y*y
>> s2x = (4.0*x2.sum()-x.sum()*x.sum())/(n*(n-1.0))
>> s2y = (4.0*y2.sum()-y.sum()*y.sum())/(n*(n-1.0))
>> xy = x * y
>> b = (4.0*xy.sum()-x.sum()*y.sum())/(4.0*x2.sum()-x.sum()*x.sum())
>> a = (y.sum()-b*x.sum())/n
>> s2xy = ((n-1.0)/(n-2.0))*(s2y-b*b*s2x)
>> ste = np.sqrt(s2xy)
>> r=b*np.sqrt(s2x)/np.sqrt(s2y)
>> print "intercept: ",a
>> print "gradient (slope): ",b
>> print "correlation coefficient, r: ",r
>> print "std err est: ",ste
>>
>> Produces the output :
>>
>> intercept: -16.2811279931
>> gradient (slope): 5.3935773612
>> correlation coefficient, r: 0.724435142118
>> std err est: 11.6964144616
>>
>> This same value for the standard error of the estimate is reported with the
>> sample x,y data at the VassarStats, Statistical Computation Web
>> Site,http://faculty.vassar.edu/lowry/VassarStats.html.
>>
>> Brandon Nuttall, KRPG-1364
>> Kentucky Geological Survey
>> www.uky.edu/kgs
>> bnuttall@uky.edu (KGS, Mo-We)
>> Brandon.nuttall@ky.gov (EEC, Th-Fr)
>> 859-257-5500 ext 30544 (main)
>> 859-323-0544 (direct)
>> 859-684-7473 (cell)
>> 859-257-1147 (FAX)
>>
>> From: scipy-user-bounces@scipy.org [mailto:scipy-user-bounces@scipy.org] On
>> Behalf Of Bruce Southey
>> Sent: Sunday, January 10, 2010 8:21 PM
>> To: SciPy Users List
>> Subject: Re: [SciPy-User] StdErr Problem with Gary Strangman's linregress
>> function
>>
>>
>>
>>
>> On Sun, Jan 10, 2010 at 3:35 PM, <totalbull@mac.com> wrote:
>>
>> Hello, Excel and scipy.stats.linregress are disagreeing on the standard
>> error of a regression.
>>
>> I need to find the standard errors of a bunch of regressions, and prefer to
>> use pure Python than RPy. So I am going to scipy.stats.linregress, as
>> advised at:
>> http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/lin_reg/#linregress
>>
>>
>> from scipy import stats
>>
>> x = [5.05, 6.75, 3.21, 2.66]
>>
>> y = [1.65, 26.5, -5.93, 7.96]
>>
>> gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
>>
>> gradient
>>
>> 5.3935773611970186
>>
>> intercept
>>
>> -16.281127993087829
>>
>> r_value
>>
>> 0.72443514211849758
>>
>> r_value**2
>>
>> 0.52480627513624778
>>
>> std_err
>>
>> 3.6290901222878866
>>
>>
>> The problem is that the std error calculation does not agree with what is
>> returned in Microsoft Excel's STEYX function (whereas all the other output
>> does). From Excel:
>>
>> <image001.png>
>>
>>
>> Anybody knows what's going on? Any alternative way of getting the standard
>> error without going to R?
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>> The Excel help is rather cryptic by :"Returns the standard error of the
>> predicted y-value for each x in the regression. The standard error is a
>> measure of the amount of error in the prediction of y for an individual x."
>> But clearly this is not the same as the standard error of the 'gradient'
>> (slope) returned by linregress. Without checking the formula, STEYX appears
>> returns the square root what most people call the mean square error (MSE).
>>
>> Bruce
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-User
mailing list