[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case
Junkshops
junkshops@gmail....
Wed Jun 6 16:18:01 CDT 2012
Hi Nathaniel,
At the outset, I'll just say that if the consensus is that we should
return NaN, I'll accept that. I'll still try and argue my case though.
> My R seems to throw an exception whenever the variance is zero
> (regardless of the mean difference), not return NaN:
Sorry, yes, that's correct.
> Like any parametric test, the t-test only makes sense under some kind
> of (at least approximate) assumptions about the data generating
> process. When the sample variance is 0, then those assumptions are
> clearly violated,
So this seems similar to argument J2, and I still don't understand it.
Let's say we assume our population data is normally distributed and we
take three samples from the population and get [1,1,1]. How does that
prove our assumption is incorrect? It's certainly possible to pull the
same number three times from a normal distribution.
> and it doesn't seem appropriate to me to start
> making up numbers according to some other rule that we hope might give
> some sort-of appropriate result ("In the face of ambiguity, refuse the
> temptation to guess."). So I actually like the R/Matlab option of
> throwing an exception or returning NaN.
Well, we're not making up numbers here - we absolutely know the means
are the same. Hence p = 1 and t = 0.
-g
More information about the SciPy-Dev
mailing list