[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case
Wed Jun 6 16:18:01 CDT 2012
At the outset, I'll just say that if the consensus is that we should
return NaN, I'll accept that. I'll still try and argue my case though.
> My R seems to throw an exception whenever the variance is zero
> (regardless of the mean difference), not return NaN:
Sorry, yes, that's correct.
> Like any parametric test, the t-test only makes sense under some kind
> of (at least approximate) assumptions about the data generating
> process. When the sample variance is 0, then those assumptions are
> clearly violated,
So this seems similar to argument J2, and I still don't understand it.
Let's say we assume our population data is normally distributed and we
take three samples from the population and get [1,1,1]. How does that
prove our assumption is incorrect? It's certainly possible to pull the
same number three times from a normal distribution.
> and it doesn't seem appropriate to me to start
> making up numbers according to some other rule that we hope might give
> some sort-of appropriate result ("In the face of ambiguity, refuse the
> temptation to guess."). So I actually like the R/Matlab option of
> throwing an exception or returning NaN.
Well, we're not making up numbers here - we absolutely know the means
are the same. Hence p = 1 and t = 0.
More information about the SciPy-Dev