# [SciPy-User] Scipy's probplot compared to R's qqplot

PHobson@Geosynte... PHobson@Geosynte...
Wed Mar 3 13:49:53 CST 2010

> On Wed, Mar 3, 2010 at 2:09 PM,  <PHobson@geosyntec.com> wrote:
> > Hey folks,
> >
> > I've taken more of an interest in statistics and Scipy lately and
> decided to compare the scipy.stats.probplot() function to R's qqplot().
> For a given dataset, the results are slightly different.
> >
> > Here's a link to the script I wrote to do the comparison.
> > http://dpaste.com/167464/
> >
> > Basically, it does the following:
> > -Uses numpy to generate some fake, noramlly distributed data
> > -Uses both R and Scipy to compute the values needed for
> quantile/probability plot
> > -Computes linear regressions on the quantile data with both R and
> Scipy.
> > -prints some output to compare the two
> >
> > My initial conclusions:
> > 1) R's lm(y~x) and scipy.stats.linregress(x,y) yield the same slope and
> intercept of a linear model. (good)
> > 2) R and Scipy compute the quantiles of a dataset in slightly different
> manners (??)
> >
> > Any clue as to why the discrepancy in #2 occurs? Would you consider it
> a big deal?

> From: scipy-user-bounces@scipy.org [mailto:scipy-user-bounces@scipy.org]
> On Behalf Of josef.pktd@gmail.com
> I would consider any significant deviation a big deal, unless we know
> that there are differences in the definitions or underlying
> assumptions.
>
> I'm not sure what's going on since I never looked at the details of
> probplot. However, when I plot the quantiles
> >>> plt.plot(np.sort(qR))
> >>> plt.plot(qS[0])
> >>> plt.show()
>
> then the graph looks almost the same except for the first and last point.

Yes. When I plotted them, I could not visually distinguish them (see attached). I forgot to mention that.

> qS[0]-np.sort(qR)
>
> differs in the second decimal, except for first and last observation.
> My guess would be that there are some differences for example in the
> continuity correction, or similar.
>
> The boundary points, however, look suspicious.

Thanks for looking  further into this. When I saw that the slopes and intercepts were different, I immediately inspected just the max and min values (laziness, sorry). If I find some time next week, I'll dig around in the source and see if I can't figure out what's happening at those points.
-Paul H.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: prob_plot_test.png
Type: image/png
Size: 23775 bytes
Desc: prob_plot_test.png
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20100303/753beaee/attachment-0001.png