[SciPy-Dev] anyone want to fix Mann-Whitney test?

josef.pktd@gmai... josef.pktd@gmai...
Sun Feb 5 08:28:39 CST 2012

On Sun, Feb 5, 2012 at 8:28 AM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:

> On Sun, Feb 5, 2012 at 1:19 PM, <josef.pktd@gmail.com> wrote:
>> On Sun, Feb 5, 2012 at 5:17 AM, Ralf Gommers <ralf.gommers@googlemail.com
>> > wrote:
>>> Hi,
>>> There's a bug report and a number of new tests for mannwhitneyu at
>>> http://projects.scipy.org/scipy/ticket/1593. These plus a fix were
>>> contributed by Sebastian Pölsterl, unfortunately he based his initial fix
>>> on GPL'ed R code. Therefore I think we can't use that, even after he
>>> modified it. I looked at the GPL code too; I think we need someone who
>>> didn't do that to implement a new fix based only on the tests and bug
>>> report.
>>> Any takers?
>> From what I remember my impression is that this is only a "cosmetic"
>> change, or better a change in what is returned.
>> >>> v, pval = stats.mannwhitneyu(x, y)
>> >>> len(x)*len(y) - v
>> 498.0
> Ah, okay. I'm not sure if this is a desirable change then. Any idea why it
> was implemented like this?

No, I was just fixing bugs. This was one of the early tests I worked on
when I didn't have stronger opinions what the standard or more informative
returns are. Since the pvalues are correct, I didn't care too much about
which test statistic is reported.

Looking a bit closer, I'm in favor of the change. Returning the short tail
instead of the asked for tail in a one-sided test is not really "clean",
and trying to rewrite this, it's not easy to figure out which is which, 210
or 498. I haven't finished yet. I like requests with a full test suite.

If I remember correctly, then we return almost all the time the two-sided
test, so adding the option for one-sided test will be backwards compatible,
but for mannwhitneyu it might not be possible.

>> >>> pval*2
>> 9.188326533255e-05
>> docstring says:
>>     The reported p-value is for a one-sided hypothesis, to get the
>> two-sided
>>     p-value multiply the returned p-value by 2.
>> currently I think none of the tests that uses normal or t distribution
>> has one versus two sided option, but I think they could be added everywhere.
>> One argument in favor of adding two one-sided options is that we return
>> the correct tail instead of the smaller tail.
> fisher_exact, kstest and ks_twosamp have less/greater/two-sided. I also
> think it makes sense to add them where possible.

None of these have a symmetric test distribution, AFAI remember. So, for
those it's not easy to figure out how to move from one sided short tail to
two-sided or the other way around.


> Ralf
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20120205/5440f57d/attachment.html 

More information about the SciPy-Dev mailing list