[SciPy-User] Questions/comments about scipy.stats.mannwhitneyu
Fri Feb 15 10:35:03 CST 2013
On Fri, Feb 15, 2013 at 11:16 AM, <firstname.lastname@example.org> wrote:
> On Thu, Feb 14, 2013 at 7:06 PM, Chris Rodgers <email@example.com> wrote:
>> Hi all
>> I use scipy.stats.mannwhitneyu extensively because my data is not at
>> all normal. I have run into a few "gotchas" with this function and I
>> wanted to discuss possible workarounds with the list.
> Can you open a ticket ? http://projects.scipy.org/scipy/report
> I partially agree, but any changes won't be backwards compatible, and
> I don't have time to think about this enough.
>> 1) When this function returns a significant result, it is non-trivial
>> to determine the direction of the effect! The Mann-Whitney test is NOT
>> a test on difference of medians or means, so you cannot determine the
>> direction from these statistics. Wikipedia has a good example of why
>> it is not a test for difference of median.
>> I've reprinted it here. The data are the finishing order of hares and
>> tortoises. Obviously this is contrived but it indicates the problem.
>> First the setup:
>> results_l = 'H H H H H H H H H T T T T T T T T T T H H H H H H H H H H
>> T T T T T T T T T'.split(' ')
>> h = [i for i in range(len(results_l)) if results_l[i] == 'H']
>> t = [i for i in range(len(results_l)) if results_l[i] == 'T']
>> And the results:
>> In : scipy.stats.mannwhitneyu(h, t)
>> Out: (100.0, 0.0097565768849708391)
>> In : np.median(h), np.median(t)
>> Out: (19.0, 18.0)
>> Hares are significantly faster than tortoises, but we cannot determine
>> this from the output of mannwhitneyu. This could be fixed by either
>> returning u1 and u2 from the guts of the function, or testing them in
>> the function and returning the comparison. My current workaround is
>> testing the means which is absolutely wrong in theory but usually
>> correct in practice.
> In some cases I'm reluctant to return the direction when we use a
> two-sided test. In this case we don't have a one sided tests.
> In analogy to ttests, I think we could return the individual u1, u2
to expand a bit:
For the Kolmogorov Smirnov test, we refused to return an indication of
the direction. The alternative is two-sided and the distribution of
the test statististic and the test statistic are different in the
So we shouldn't draw any one-sided conclusions from the two-sided test.
In the t_test and mannwhitenyu the test statistic is normally
distributed (in large samples), so we can infer the one-sided test
from the two-sided statistic and p-value.
If there are tables for the small sample case, we would need to check
if we get consistent interpretation between one- and two-sided tests.
>> 2) The documentation states that the sample sizes must be at least 20.
>> I think this is because the normal approximation for U is not valid
>> for smaller sample sizes. Is there a table of critical values for U in
>> scipy.stats that is appropriate for small sample sizes or should the
>> user implement his or her own?
> not available in scipy. I never looked at this.
> pull requests for this are welcome if it works. It would be backwards
>> 3) This is picky but is there a reason that it returns a one-tailed
>> p-value, while other tests (eg ttest_*) default to two-tailed?
> legacy wart, that I don't like, but it wasn't offending me enough to change it.
>> Thanks for any thoughts, tips, or corrections and please don't take
>> these comments as criticisms ... if I didn't enjoy using scipy.stats
>> so much I wouldn't bother bringing this up!
> Thanks for the feedback.
> In large parts review of the functions relies on comments by users
> (and future contributors).
> The main problem is how to make changes without breaking current
> usage, since many of those functions are widely used.
>> SciPy-User mailing list
More information about the SciPy-User