[SciPy-user] error estimate in stats.linregress

josef.pktd@gmai... josef.pktd@gmai...
Mon Feb 23 15:04:00 CST 2009

On Mon, Feb 23, 2009 at 3:01 PM, Bruce Southey <bsouthey@gmail.com> wrote:
> Hi,
> Yes, the formula is incorrect. The reason is that the sum of squares
> terms are not corrected by the means because the ss function just
> computes the uncorrected sum of squares.
> Thus the correct formula should :
> sterrest = np.sqrt(((1-r*r)*(ss((y-ymean))))/(df*(ss(x-xmean))))
> Alternatively:
> sterrest = np.sqrt((1-r*r)*(ss(y)-n*ymean*ymean)/ (ss(x)-n*xmean*xmean)
> / df)
> Note the formula is derived using the definition of R-squared:
> The estimated variance of the slope = MSE/Sxx= ((1-R*R)*Syy)/(df*Sxx)
> where Syy and Sxx are the corrected sums of squares for Y and X,
> respectively,
> Regards
> Bruce
> Hi all,
> I was working with linear regression in scipy and met some problems
> with value of standard error of the estimate returned by
> scipy.stats.linregress() function. I could not compare it to similar
> outputs of other linear regression routines (for example in Origin),
> so I took a look in the source (stats.py).
> In the source it is defined as
> sterrest = np.sqrt((1-r*r)*ss(y) / ss(x) / df)
> where r is correlation coefficient, df is degrees of freedom (N-2) and
> ss() is sum of squares of elements.
> After digging through literature the only formula looking somewhat the
> same was found to be
> stderrest = np.sqrt((1-r*r)*ss(y-y.mean())/df)
> which gives the same result as a standard definition (in notation of
> the source of linregress)
> stderrest = np.sqrt(ss(y-slope*x-intercept)/df)
> but the output of linregress is different.
> I humbly suppose this is a bug, but maybe somebody could explain me
> what is it if I'm wrong...
> Pavlo.

Thank you for reporting and checking this.

I fixed it in trunk, but still have to add a test.
There are still small  (1e-4) numerical differences to the
multivariate ols version in the example I tried


More information about the SciPy-user mailing list