[SciPy-user] scipy.stats.stats mannwhitneyu vs ranksums?

Elias Pampalk elias.pampalk@gmail....
Fri Jul 3 11:06:58 CDT 2009


Thanks David! 

 

I did a quick comparison between Matlab/stats (R14SP3), R (2.8.1), and
Python/SciPy (0.7). Maybe this is somehow useful for others too.

 

(I'm intentionally violating the continuous distribution assumptions.)

 

Samples:

A1 <-> B: not paired with ties

A2 <-> B: not paired without ties

 

A1 <-> C: paired with zeros 

A2 <-> C: paired without zeros

 

- Matlab

      A1 = 0:19

      A2 = A1 + (1:20)./100

      B = 0:39

      C = [0:14,16:20]

 

- R

      A1 <- 0:19

      A2 <- A1 + 1:20/100

      B <- 0:39

      C <- c(0:14,16:20)

 

- SciPy

A1 = numpy.arange(20)

A2 = A1 + numpy.arange(1,21)/100.0

B = numpy.arange(40)

C = numpy.array(range(15) + range(16,21))

 

 

2 Samples, Not Paired

=====================

 

(from scipy.stats import stats)

 

Kruskal-Wallis Test 

-------------------

 

Same p-values for all.

 

Samples contain ties:

 

- Matlab: kruskalwallis([A1,B],[A1*0,B*0+1]) = 0.00170615101265

- R: kruskal.test(list(A1,B)) = 0.00170615101265

- R: wilcox.test(A1,B, correct=FALSE) = 0.00170615101265 (+warning: ties)

- SciPy: stats.kruskal(A1,B) = 0.00170615101265

 

(R: kruskal = wilcox without correction for continuity)

 

Samples without ties:

 

- Matlab: kruskalwallis([A2,B], [A2*0,B*0+1]) = 0.00288777919292

- R: kruskal.test(list(A2,B)) = 0.00288777919292

- SciPy: stats.kruskal(A2,B) = 0.00288777919292

      

 

Wilcoxon Rank Sum (aka Mann Whitney U) Test

-------------------------------------------

 

Matlab and R identical (but different defaults wrt exact/approximate), 

SciPy computes approximate results and does not correct for continuity
(changed in version 7.1 for stats.mannwhitneyu?).

 

Samples contain ties:

 

- Matlab: ranksum(A1,B) = 0.00175235702866

- R: wilcox.test(A1,B) = 0.00175235702866 (+warning: ties)

 

- R: wilcox.test(A1,B,correct=FALSE) = 0.001706151012654 (+warning: ties)

 

- SciPy: stats.mannwhitneyu(A1,B)[1]*2 = 0.0017086895586986284

 

- SciPy: stats.ranksums(A1,B) = 0.0017112312247389294

 

Samples without ties:

 

- Matlab: ranksum(A2,B) = 0.00296255173431

- R: wilcox.test(A2,B, exact=FALSE) = 0.00296255173431

 

- Matlab: ranksum(A2,B,'method','exact') = 0.00246078580826

- R: wilcox.test(A2,B) = 0.00246078580826

 

- R: wilcox.test(A2,B, exact=FALSE, correct=FALSE) = 0.00288777919292

- SciPy: stats.mannwhitneyu(A2,B)[1]*2 = 0.00288777919292

- SciPy: stats.ranksums(A2,B) = 0.00288777919292

 

(SciPy: mannwhitneyu = ranksums = kruskal if no ties)

 

 

2 Samples, Paired, Wilcoxon Sign Rank Test

==========================================

 

(from scipy.stats import wilcoxon)

 

Matlab and SciPy do not correct for continuity and R does.

Matlab and R have different defaults for exact/approximate.

Matlab computes exact results also if ties/zeros exist.

 

With zeros:

 

- Matlab: signrank(A1,C,'method','approximate') = 0.02534731867747

- R: wilcox.test(A1 - C, correct=FALSE) = 0.02534731867747 (+warnings: ties
+ zeros)

 

- Matlab: signrank(A1,C) = 0.06250000000000

 

- R: wilcox.test(A1 - C) = 0.0368884257070 (+warnings: ties + zeros)

 

- SciPy: wilcoxon(A1,C) = nan (+error: sample size too small)

      

Without zeros:

 

- Matlab: signrank(A2,C,'method','exact') = 0.59581947326660

- R: wilcox.test(A2 - C) = 0.59581947326660

 

- Matlab: signrank(A2,C) = 0.57548622813650     

- R: wilcox.test(A2 - C, exact=FALSE, correct=FALSE) = 0.57548622813650

- SciPy: wilcoxon(A2,C) = 0.57548622813650

 

- R: wilcox.test(A2 - C, exact=FALSE) = 0.5882844808893

 

Elias

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20090703/6d26bbf6/attachment-0001.html 


More information about the SciPy-user mailing list