[SciPy-user] scipy.stats.stats mannwhitneyu vs ranksums?

Elias Pampalk elias.pampalk@gmail....
Fri Jul 3 11:06:58 CDT 2009

```Thanks David!

I did a quick comparison between Matlab/stats (R14SP3), R (2.8.1), and
Python/SciPy (0.7). Maybe this is somehow useful for others too.

(I'm intentionally violating the continuous distribution assumptions.)

Samples:

A1 <-> B: not paired with ties

A2 <-> B: not paired without ties

A1 <-> C: paired with zeros

A2 <-> C: paired without zeros

- Matlab

A1 = 0:19

A2 = A1 + (1:20)./100

B = 0:39

C = [0:14,16:20]

- R

A1 <- 0:19

A2 <- A1 + 1:20/100

B <- 0:39

C <- c(0:14,16:20)

- SciPy

A1 = numpy.arange(20)

A2 = A1 + numpy.arange(1,21)/100.0

B = numpy.arange(40)

C = numpy.array(range(15) + range(16,21))

2 Samples, Not Paired

=====================

(from scipy.stats import stats)

Kruskal-Wallis Test

-------------------

Same p-values for all.

Samples contain ties:

- Matlab: kruskalwallis([A1,B],[A1*0,B*0+1]) = 0.00170615101265

- R: kruskal.test(list(A1,B)) = 0.00170615101265

- R: wilcox.test(A1,B, correct=FALSE) = 0.00170615101265 (+warning: ties)

- SciPy: stats.kruskal(A1,B) = 0.00170615101265

(R: kruskal = wilcox without correction for continuity)

Samples without ties:

- Matlab: kruskalwallis([A2,B], [A2*0,B*0+1]) = 0.00288777919292

- R: kruskal.test(list(A2,B)) = 0.00288777919292

- SciPy: stats.kruskal(A2,B) = 0.00288777919292

Wilcoxon Rank Sum (aka Mann Whitney U) Test

-------------------------------------------

Matlab and R identical (but different defaults wrt exact/approximate),

SciPy computes approximate results and does not correct for continuity
(changed in version 7.1 for stats.mannwhitneyu?).

Samples contain ties:

- Matlab: ranksum(A1,B) = 0.00175235702866

- R: wilcox.test(A1,B) = 0.00175235702866 (+warning: ties)

- R: wilcox.test(A1,B,correct=FALSE) = 0.001706151012654 (+warning: ties)

- SciPy: stats.mannwhitneyu(A1,B)[1]*2 = 0.0017086895586986284

- SciPy: stats.ranksums(A1,B) = 0.0017112312247389294

Samples without ties:

- Matlab: ranksum(A2,B) = 0.00296255173431

- R: wilcox.test(A2,B, exact=FALSE) = 0.00296255173431

- Matlab: ranksum(A2,B,'method','exact') = 0.00246078580826

- R: wilcox.test(A2,B) = 0.00246078580826

- R: wilcox.test(A2,B, exact=FALSE, correct=FALSE) = 0.00288777919292

- SciPy: stats.mannwhitneyu(A2,B)[1]*2 = 0.00288777919292

- SciPy: stats.ranksums(A2,B) = 0.00288777919292

(SciPy: mannwhitneyu = ranksums = kruskal if no ties)

2 Samples, Paired, Wilcoxon Sign Rank Test

==========================================

(from scipy.stats import wilcoxon)

Matlab and SciPy do not correct for continuity and R does.

Matlab and R have different defaults for exact/approximate.

Matlab computes exact results also if ties/zeros exist.

With zeros:

- Matlab: signrank(A1,C,'method','approximate') = 0.02534731867747

- R: wilcox.test(A1 - C, correct=FALSE) = 0.02534731867747 (+warnings: ties
+ zeros)

- Matlab: signrank(A1,C) = 0.06250000000000

- R: wilcox.test(A1 - C) = 0.0368884257070 (+warnings: ties + zeros)

- SciPy: wilcoxon(A1,C) = nan (+error: sample size too small)

Without zeros:

- Matlab: signrank(A2,C,'method','exact') = 0.59581947326660

- R: wilcox.test(A2 - C) = 0.59581947326660

- Matlab: signrank(A2,C) = 0.57548622813650

- R: wilcox.test(A2 - C, exact=FALSE, correct=FALSE) = 0.57548622813650

- SciPy: wilcoxon(A2,C) = 0.57548622813650

- R: wilcox.test(A2 - C, exact=FALSE) = 0.5882844808893

Elias

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20090703/6d26bbf6/attachment-0001.html
```