[SciPy-user] help with scipy.stats.mannwhitneyu

josef.pktd@gmai... josef.pktd@gmai...
Thu Feb 5 14:30:07 CST 2009

On Thu, Feb 5, 2009 at 2:51 PM, Sturla Molden <sturla@molden.no> wrote:
> On 2/5/2009 7:03 PM, Sturla Molden wrote:
>> By the way, there is a fucntion scipy.stats.ranksums that does a
>> Wilcoxon rank-sum test. It seems to be using a large-sample
>> approximation, and has no correction for tied ranks.
> Here is a modification of SciPy's ranksums to allow small samples and
> correct for tied ranks.

there are absolute values missing abs(z-expected), I also prefer the
correction p*2 since it is a two-sided test

sample size 20, 9 ties
this is with R wilcox.exact, ranksums is your ranksum

>>> rwilcex(rvs1[:20],4*ind10+rvs2t[:20],exact=True)['p.value']
>>> ranksums(rvs1[:20],4*ind10+rvs2t[:20])     #wrong tail because no abs()
(357.0, -1.4336547191212172, 0.9241645900073665, 0.92800000000000005)
>>> ranksums(4*ind10+rvs2t[:20],rvs1[:20])
(463.0, 1.4336547191212172, 0.075835409992633496, 0.068000000000000005)
>>> ranksums(4*ind10+rvs2t[:20],rvs1[:20])[3]*2
>>> ranksums(4*ind10+rvs2t[:20],rvs1[:20])[2]*2
>>> stats.mannwhitneyu(rvs1[:20],4*ind10+rvs2t[:20])[1]*2

With this correction, the normal distribution based p-value in
ranksums looks exactly the same as stats.mannwhitneyu. your Monte
Carlo p-value differs more from R's exact result than the normal
distribution based p-value.

Overall, the differences in p-values look pretty small in the examples
I tried out, so my guess is that a Monte-Carlo on the correct size and
power of the tests will show very similar rejection rates, at critical
values of 0.05 or 0.1.

But I don't have time for that now.


More information about the SciPy-user mailing list