[SciPy-User] Scipy's probplot compared to R's qqplot

Robert Kern robert.kern@gmail....
Wed Mar 3 13:38:21 CST 2010

On Wed, Mar 3, 2010 at 13:09,  <PHobson@geosyntec.com> wrote:
> Hey folks,
> I've taken more of an interest in statistics and Scipy lately and decided to compare the scipy.stats.probplot() function to R's qqplot(). For a given dataset, the results are slightly different.
> Here's a link to the script I wrote to do the comparison.
> http://dpaste.com/167464/
> Basically, it does the following:
> -Uses numpy to generate some fake, noramlly distributed data
> -Uses both R and Scipy to compute the values needed for quantile/probability plot
> -Computes linear regressions on the quantile data with both R and Scipy.
> -prints some output to compare the two
> My initial conclusions:
> 1) R's lm(y~x) and scipy.stats.linregress(x,y) yield the same slope and intercept of a linear model. (good)
> 2) R and Scipy compute the quantiles of a dataset in slightly different manners (??)
> Any clue as to why the discrepancy in #2 occurs?

There are several, slightly different but mostly reasonable ways of
computing quantiles.

> Would you consider it a big deal?

Probably not, but I'm happy to entertain arguments to the contrary if
you would care to explain how R is computing the quantiles.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

More information about the SciPy-User mailing list