[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case

Junkshops junkshops@gmail....
Wed Jun 6 23:29:09 CDT 2012


Merging the last two messages from Josef here...
> If you just have one sample (of 2 series) or draw from two
> populations, you cannot have a p-value of 1 (with nonzero probability)
> unless you sample the entire populations (or you have a dogmatic
> prior, or your model assumptions are wrong).  (I hope this has enough
> qualifiers and brackets to be correct.:)
Then why

 >>> stats.ttest_ind([1e-100, 0, -1e100], [1e-100, 0, -1e100])
(0.0, 1.0)

> You have a continuum of numbers (an uncountable infinite number of
> possibilities). Each point has zero probability of being selected.
> (But we have a density that points in a neighborhood dx are selected.)
> You can select a first point, but then the second point has to be the
> same up to an infinite number of decimals. The probability that all
> decimals are the same is zero.
Heh, that's where my mind went but that seemed too simple for some 
reason. I thought I was missing something.

> >If you're pulling data from a discrete distribution it could happen
> >though (unless I'm mistaken).
> It can happen in the discrete distribution case, but in this case this
> doesn't have zero probability and the calculations can follow the
> standard theory (no 0/0)
Sorry, I'm not following this. If we have samples from a discrete 
distribution (and in practice, any computerized data set is discrete 
since you have to map the reals to a finite representation), then it's 
possible, however unlikely, to have [0,0,0], [0,0,0] in the data set. 
Right? In which case we could wind up with a 0/0.

> I thought C is the most obvious solution, paired versus unpaired are
> two different sampling schemes.
*shrug* you're the maintainer, and I don't care enough to argue. So:

- I'll merge the two 2 sample t-test functions
- add an uneq_var=False kw arg, setting to true will use the new code
- add an zoz=np.nan kw arg and a check that it's np.nan, 0 or 1. 
Otherwise raise ValueError
- update docs & tests.
- push.

Anything else?

Sorry for any grey hairs I may have caused you,

Gavin


More information about the SciPy-Dev mailing list