[SciPy-user] Limits of linrgress - underflow encountered in stdtr

josef.pktd@gmai... josef.pktd@gmai...
Tue Jun 9 08:45:49 CDT 2009


On Tue, Jun 9, 2009 at 7:57 AM, wierob<wierob83@googlemail.com> wrote:
> Hi,
>
> for z = 30 my code sample prints
>
> ===== dependency_with_noise =====
> slope: 2.0022556391
> intercept: -0.771428571429
> r^2: 0.953601402677
> p-value: 0.0
> stderr: 0.0258507089053
>
> so I'm just confused that the p-value claims the match is absolutely
> perfect while it is not (also its pretty close to perfect). If compared
> this result to R (www.*r*-project.org) :
>
>> summary(lm(y~x))
>
> Call:
> lm(formula = y ~ x)
>
> Residuals:
>     Min       1Q   Median       3Q      Max
> -12.2624   0.7325   0.7477   0.7635   7.7511
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.77143    0.28728  -2.685  0.00745 **
> x            2.00226    0.02585  77.455  < 2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 3.651 on 598 degrees of freedom
> Multiple R-squared: 0.9094,     Adjusted R-squared: 0.9092
> F-statistic:  5999 on 1 and 598 DF,  p-value: < 2.2e-16
>
>> summary(lm(y~x))$coefficients
>              Estimate Std. Error   t value      Pr(>|t|)
> (Intercept) -0.7714286 0.28728036 -2.685281  7.447975e-03
> x            2.0022556 0.02585071 77.454574 6.009953e-314
>
>
> The intercept, slope (x) and stderr values are equal

good

but the p-value is
> 6.009953e-314 and r-squared is different. While 6.009953e-314 is small
> enough to say its 0 and the result is highly significant

*highly significant* ???

What significance level do you want to use to accept the Null when you
are using the result of R?

Note: R initially only reported Pr(>|t|) < 2e-16 ***

There are arguments for reporting any statistics only to a few
decimals. I wonder why?

Josef

, I just wonder
> if Scipy decides its small enough to return 0.0 or if it returns 0.0
> because it cant actually compute it. If 0.0 is returned deliberately
> what's the threshold for this decision. Maybe this behavior should be
> documented.
>
>
> regards
> robert
>
> josef.pktd@gmail.com schrieb:
>> On Mon, Jun 8, 2009 at 5:16 PM, wierob<wierob83@googlemail.com> wrote:
>>
>>> Hi,
>>>
>>>
>>>> turn of numpy.seterr(all="raise")
>>>> as explained in the reply to your previous messages
>>>>
>>>> Josef
>>>>
>>>>
>>> turning of the error reporting doesn't prevent the error. Thus the
>>> result may be wrong, doesn't it? E.g. a p-value of 0.0 looks suspicious.
>>>
>>>
>>
>> anything else than a p-value of 0 would be suspicious, you have a
>> perfect fit and the probability is zero that we observe a slope equal
>> to the estimated slope under the null hypothesis( that the slope is
>> zero). So (loosely speaking) we can reject the null of zero slope with
>> probability 1.
>> The result is not "maybe" wrong, it is correct. your r_square is 1,
>> the standard error of the slope estimate is zero.
>>
>>
>> floating point calculation with inf are correct (if they don't have a
>> definite answer we get a nan). Dividing a non-zero number by zero has
>> a well defined result, even if python raises a zerodivisionerror.
>>
>>
>>>>> np.array(1)/0.
>>>>>
>> inf
>>
>>>>> 1/(np.array(1)/0.)
>>>>>
>> 0.0
>>
>>>>> np.seterr(all="raise")
>>>>>
>> {'over': 'ignore', 'divide': 'ignore', 'invalid': 'ignore', 'under': 'ignore'}
>>
>>>>> 1/(np.array(1)/0.)
>>>>>
>> Traceback (most recent call last):
>>   File "<pyshell#39>", line 1, in <module>
>>     1/(np.array(1)/0.)
>> FloatingPointError: divide by zero encountered in divide
>>
>> Josef
>> _______________________________________________
>> SciPy-user mailing list
>> SciPy-user@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-user mailing list