# [SciPy-User] probplot's return value r or r^2?

josef.pktd@gmai... josef.pktd@gmai...
Thu Feb 18 08:55:23 CST 2010

```On Thu, Feb 18, 2010 at 8:59 AM, mikey <abc.mikey@googlemail.com> wrote:
> Hi there,
>
> Hopefully you can help me with a couple of things I'm not getting
> about scipy's stats package. Both of which are shown on this image:
> http://img198.yfrog.com/img198/8928/normalqqplotsection3tot.png
>
> Firstly I want to know what is returned by the r value from probplot?
> The documentation says that it returns r but if you have it plot the
> graph for you it plots r^2. If you compare them you find that the
> returned value r and the plotted result for r^2 are the same, so which
> is it?

general
There is a gap in the docstring for linregress, no explanation for return r
I fixed probplot so that the graphs work and the results look
"reasonable" but I never verified whether all the results are correct.
I think I verified shapiro against R and it works in Monte Carlo, but
I have to look at my notes to be sure.

to the question
probplot uses stats.linregress to estimate the line in the probplot,
and r is taken from it. r is just the correlation coefficient of x and
y. I don't remember the relationship to the regression R^2 offhand,
whether it's squared or not

>>> tmpx, tmpy = np.random.randn(2,10)
>>> tmpx
array([ 0.07649761, -3.12336718, -0.10403981, -1.26699021,  0.91165385,
-0.78344642,  1.6671443 ,  2.15933311, -1.24495897, -2.67112134])
>>> np.corrcoef(tmpx,tmpy)
array([[ 1.        ,  0.36073832],
[ 0.36073832,  1.        ]])
>>> stats.linregress(tmpx,tmpy)
(0.24698183423380241, 0.15082677908334857, 0.36073832198115868,
0.30580467614265466, 0.22576383851968623)

>
> Also I'm trying to work out what the p value from the shapiro test are
> showing me. I've plotted the W and p values from the test on my
> probplot for my data  and the probplot shows a good correlation
> between my data and the expected distribution for normal data but p is
> too low to accept the null hypothesis that it is normal. Am I
> interpereting what it's showing me properly, or is it just my data?

>From the graph it looks like your data is discrete, is it? In that
case, I wouldn't be surprised if the normality hypothesis is rejected.
I'm not familiar with the details of the Shapiro-Wilk test and the
source is in Fortran, but from a brief look at the Wikipedia
description, it seems to be a quite strict test on the quantiles.

You could also try the other normality tests in scipy.stats, kstest,
normaltest, anderson, chisquare ? to see whether they also reject the
hypothesis of a normal distribution. (As an aside, Skipper started to
add qqplot and residual tests in statsmodels, but they are still
sandbox code.)

I appreciate any feedback on these function, since they are still
under-documented and I'm not sure whether all the results are correct.
And for some functions, I'm not sure whether they are actually used by
anyone. (For example, the plotting was broken in stats.probplot for
quite some time and I didn't find any comments in the mailing list
archive)

Josef

>