[SciPy-User] Scipy's probplot compared to R's qqplot
Wed Mar 3 13:36:39 CST 2010
On Wed, Mar 3, 2010 at 2:09 PM, <PHobson@geosyntec.com> wrote:
> Hey folks,
> I've taken more of an interest in statistics and Scipy lately and decided to compare the scipy.stats.probplot() function to R's qqplot(). For a given dataset, the results are slightly different.
> Here's a link to the script I wrote to do the comparison.
> Basically, it does the following:
> -Uses numpy to generate some fake, noramlly distributed data
> -Uses both R and Scipy to compute the values needed for quantile/probability plot
> -Computes linear regressions on the quantile data with both R and Scipy.
> -prints some output to compare the two
> My initial conclusions:
> 1) R's lm(y~x) and scipy.stats.linregress(x,y) yield the same slope and intercept of a linear model. (good)
> 2) R and Scipy compute the quantiles of a dataset in slightly different manners (??)
> Any clue as to why the discrepancy in #2 occurs? Would you consider it a big deal?
I would consider any significant deviation a big deal, unless we know
that there are differences in the definitions or underlying
I'm not sure what's going on since I never looked at the details of
probplot. However, when I plot the quantiles
then the graph looks almost the same except for the first and last point.
differs in the second decimal, except for first and last observation.
My guess would be that there are some differences for example in the
continuity correction, or similar.
The boundary points, however, look suspicious.
Thanks for checking this,
> Python v2.6.2 (XP) and v2.6.4 (Karmic and Snow Leopard)
> Scipy v0.7.1
> Numpy v1.4.0
> R v2.10.0
> Rpy2 v2.0.8
> -Paul H.
> SciPy-User mailing list
More information about the SciPy-User