[SciPy-user] help with scipy.stats.mannwhitneyu
Sturla Molden
sturla@molden...
Thu Feb 5 08:32:59 CST 2009
On 2/5/2009 12:37 PM, Wavy Davy wrote:
> I am using the mannwhitneyu in the stats module, and I was looking the
> code and I see this notice in the docstring.
>
> "Use only when the n in each condition is < 20 and you have 2
> independent samples of ranks. "
>
> Am I reading it correctly that this test should only be used with
> sample sizes less than 20?
First of all, the Mann-Withney U-test should NEVER be used. It has
assumptions that are mathematically problematic, known as the
"Behrens-Fisher problem". What you probably want to use is the "Wilcoxon
rank-sum test". Despite common belief, Mann-Withney U and Wilcoxon
rank-sum are not the same test. The latter assumes equal variance, the
former do not. The Mann-Withney U has even been shown to fail when
distributions have unequal variance (Journal of Experimental Education,
Vol. 60, 1992), so its justification over the Wilcoxon rank-sum test is
questionable. Wikipedia says the Wilcoxon rank-sum test assumes equal
sample sizes; this is not correct.
I would vote for the immediate removal of Mann-Withney U-test from
SciPy. The only thing it should do is raise an exception and instruct
the user to apply a t-test or Wilcoxon rank-sum test instead.
As a side note, if you request a Mann-Withney test in MINITAB, you
actually get a Wilcoxon rank-sum test instead.
Then for your question:
If N > 20, you can just as well use a t-test. Its assumptions will be
asymptotically valid due to the central limit theorem, even though the
data are not normally distributed. If you are worried about outliers, as
opposed to systematic deviation from normality, use the Wilcoxon
rank-sum test instead: When the data is transformed to rank scale and
the two sample sizes are M and N respectively, the Mann-Withney
U-statistic has O(N*M) complexity whereas the Wilcoxon rank-sum
statistic only has O(N+M) complexity. O(N*M) behaviour makes the
Mann-Withney U-statistic intractable for large samples.
Sturla Molden
More information about the SciPy-user
mailing list