[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case
Wed Jun 6 16:35:58 CDT 2012
On Wed, Jun 6, 2012 at 5:18 PM, Junkshops <junkshops@gmail.com> wrote:
> Hi Nathaniel,
>
> At the outset, I'll just say that if the consensus is that we should
> return NaN, I'll accept that. I'll still try and argue my case though.
>
>> My R seems to throw an exception whenever the variance is zero
>> (regardless of the mean difference), not return NaN:
> Sorry, yes, that's correct.
>
>> Like any parametric test, the t-test only makes sense under some kind
>> of (at least approximate) assumptions about the data generating
>> process. When the sample variance is 0, then those assumptions are
>> clearly violated,
> So this seems similar to argument J2, and I still don't understand it.
> Let's say we assume our population data is normally distributed and we
> take three samples from the population and get [1,1,1]. How does that
> prove our assumption is incorrect? It's certainly possible to pull the
> same number three times from a normal distribution.
>
How do you justify that 3 empirical observations [1,1,1] come from a
normal distribution? If you have enough data for the central limit
theorem to come into play, and your variance is still 0, this is so
unlikely that I think the consequences of *possibly* incorrectly
returning NaN here would be small. If you're simulating data from a
known distribution, take another draw...
>> and it doesn't seem appropriate to me to start
>> making up numbers according to some other rule that we hope might give
>> some sort-of appropriate result ("In the face of ambiguity, refuse the
>> temptation to guess."). So I actually like the R/Matlab option of
>> throwing an exception or returning NaN.
>
> Well, we're not making up numbers here - we absolutely know the means
> are the same. Hence p = 1 and t = 0.
But what we don't know is if the test is even appropriate, so why not
be cautious and return NaN. It's very easy for a user to make the
decision that NaN implies p = 1, if that's what you want to have.
This doesn't seem to be of all that much practical importance. In what
situation do you expect this to really matter?
Skipper
