[SciPy-Dev] anyone want to fix Mann-Whitney test?
Sun Feb 5 08:28:39 CST 2012
On Sun, Feb 5, 2012 at 8:28 AM, Ralf Gommers <email@example.com>wrote:
> On Sun, Feb 5, 2012 at 1:19 PM, <firstname.lastname@example.org> wrote:
>> On Sun, Feb 5, 2012 at 5:17 AM, Ralf Gommers <email@example.com
>> > wrote:
>>> There's a bug report and a number of new tests for mannwhitneyu at
>>> http://projects.scipy.org/scipy/ticket/1593. These plus a fix were
>>> contributed by Sebastian Pölsterl, unfortunately he based his initial fix
>>> on GPL'ed R code. Therefore I think we can't use that, even after he
>>> modified it. I looked at the GPL code too; I think we need someone who
>>> didn't do that to implement a new fix based only on the tests and bug
>>> Any takers?
>> From what I remember my impression is that this is only a "cosmetic"
>> change, or better a change in what is returned.
>> >>> v, pval = stats.mannwhitneyu(x, y)
>> >>> len(x)*len(y) - v
> Ah, okay. I'm not sure if this is a desirable change then. Any idea why it
> was implemented like this?
No, I was just fixing bugs. This was one of the early tests I worked on
when I didn't have stronger opinions what the standard or more informative
returns are. Since the pvalues are correct, I didn't care too much about
which test statistic is reported.
Looking a bit closer, I'm in favor of the change. Returning the short tail
instead of the asked for tail in a one-sided test is not really "clean",
and trying to rewrite this, it's not easy to figure out which is which, 210
or 498. I haven't finished yet. I like requests with a full test suite.
If I remember correctly, then we return almost all the time the two-sided
test, so adding the option for one-sided test will be backwards compatible,
but for mannwhitneyu it might not be possible.
>> >>> pval*2
>> docstring says:
>> The reported p-value is for a one-sided hypothesis, to get the
>> p-value multiply the returned p-value by 2.
>> currently I think none of the tests that uses normal or t distribution
>> has one versus two sided option, but I think they could be added everywhere.
>> One argument in favor of adding two one-sided options is that we return
>> the correct tail instead of the smaller tail.
> fisher_exact, kstest and ks_twosamp have less/greater/two-sided. I also
> think it makes sense to add them where possible.
None of these have a symmetric test distribution, AFAI remember. So, for
those it's not easy to figure out how to move from one sided short tail to
two-sided or the other way around.
> SciPy-Dev mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-Dev