On Sun, Feb 5, 2012 at 8:28 AM, Ralf Gommers wrote:
>
>
On Sun, Feb 5, 2012 at 1:19 PM, <josef.pktd@gmail.com> wrote:
>
>>
>>
On Sun, Feb 5, 2012 at 5:17 AM, Ralf Gommers wrote:
>> > wrote:
>>
Hi,
>>>
There's a bug report and a number of new tests for mannwhitneyu at
http://projects.scipy.org/scipy/ticket/1593. These plus a fix were
contributed by Sebastian Pölsterl, unfortunately he based his initial fix
on GPL'ed R code. Therefore I think we can't use that, even after he
modified it. I looked at the GPL code too; I think we need someone who
didn't do that to implement a new fix based only on the tests and bug
report.
>>>
Any takers?
>>>
>>
From what I remember my impression is that this is only a "cosmetic"
change, or better a change in what is returned.
>>
>>> v, pval = stats.mannwhitneyu(x, y)
>>> len(x)*len(y) - v
498.0
>>
>
Ah, okay. I'm not sure if this is a desirable change then. Any idea why it
was implemented like this?
>
No, I was just fixing bugs. This was one of the early tests I worked on
when I didn't have stronger opinions what the standard or more informative
returns are. Since the pvalues are correct, I didn't care too much about
which test statistic is reported.
Looking a bit closer, I'm in favor of the change. Returning the short tail
instead of the asked for tail in a one-sided test is not really "clean",
and trying to rewrite this, it's not easy to figure out which is which, 210
or 498. I haven't finished yet. I like requests with a full test suite.
If I remember correctly, then we return almost all the time the two-sided
test, so adding the option for one-sided test will be backwards compatible,
but for mannwhitneyu it might not be possible.
>
>>> pval*2
9.188326533255e-05
>>
>>
docstring says:
The reported p-value is for a one-sided hypothesis, to get the
two-sided
p-value multiply the returned p-value by 2.
>>
currently I think none of the tests that uses normal or t distribution
has one versus two sided option, but I think they could be added everywhere.
One argument in favor of adding two one-sided options is that we return
the correct tail instead of the smaller tail.
>>
>
fisher_exact, kstest and ks_twosamp have less/greater/two-sided. I also
think it makes sense to add them where possible.
>
None of these have a symmetric test distribution, AFAI remember. So, for
those it's not easy to figure out how to move from one sided short tail to
two-sided or the other way around.
Josef
>
Ralf
>
