[SciPy-User] stats.ranksums vs. stats.mannwhitneyu

Nils Kölling nkoelling@gmail....
Wed Oct 10 07:59:30 CDT 2012


Thank you for your reply, Josef! Is there any reason you are
calculating the test manually in your code instead of using
scipy.stats.kruskal?

I have written my own version for permutation-based p-values using
stats.mannwhitneyu now and ran a few trials. Here is what I get for:

a=8*[0]
b=n*[1]

n = 1  - normal = 0.0133283287808  / permuted = 0.109775608976
n = 2  - normal = 0.00491580235039  / permuted = 0.0232390704372
n = 3  - normal = 0.00244136177941  / permuted = 0.00559977600896
n = 4  - normal = 0.00131365315366  / permuted = 0.00185992560298
n = 5  - normal = 0.000731481991814  / permuted = 0.000719971201152
n = 6  - normal = 0.000414875963454  / permuted = 0.000539978400864
n = 7  - normal = 0.000237996579543  / permuted = 0.00019999200032
n = 8  - normal = 0.000137586057166  / permuted = 0.000159993600256
n = 9  - normal = 7.99851933706e-05  / permuted = 7.9996800128e-05

So if we assume that the permuted p-value is the "true" value, it
seems like one could get away with just using the normal,
non-permutation based version for n >= 5, since the permuted value
does not differ much from the normal one anymore. What do you think?

Cheers

Nils


More information about the SciPy-User mailing list