[SciPy-Dev] Resolving PR 235: t-statistic = 0/0 case

Ralf Gommers ralf.gommers@googlemail....
Sun Jun 10 05:33:27 CDT 2012

On Sat, Jun 9, 2012 at 1:04 PM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:

> On Thu, Jun 7, 2012 at 10:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
>> On Thu, Jun 7, 2012 at 5:29 AM, Junkshops <junkshops@gmail.com> wrote:
>> > - I'll merge the two 2 sample t-test functions
>> > - add an uneq_var=False kw arg, setting to true will use the new code
>> equal_var would be a better name, to avoid the double-negative.
>> Would it be possible/desireable to make equal_var=False the default?
>> Obviously this would require a deprecation period, but as semantic
>> changes go it's relatively low risk -- anyone who misses the warnings
>> etc. would just find one day that their t tests were producing more
>> conservative/realistic values.
> I'm not in favor of adding a deprecation warning for this. It's a minor
> thing, and warnings are annoying - it does require the user to go and
> figure out what changed. My preference would be to merge the current PR as
> is, and add a new function that combines all four t-tests with an interface
> similar to R. There the new default can be equal_var=False without annoying
> anyone.
>> (R defaults to doing the unequal variances test, and I have actually
>> seen this fact used in their advocacy, as evidence for their branding
>> as the tool for people who care about statistical rigor and
>> soundness.)
>> > - add an zoz=np.nan kw arg and a check that it's np.nan, 0 or 1.
>> > Otherwise raise ValueError
>> Let's please not add this "zoz=" feature. Adding features has a real
>> cost (in terms of testing, writing docs, maintenance, and most
>> importantly, the total time spent by all users reading about this
>> pointless thing in the docs and being distracted by it). It's only
>> benefit would be to smooth over this debate on the mailing list; I
>> can't believe that any real user will actually care about this, ever.
> Agreed.
> And +1 for 0/0 --> NaN.

The PR is now merged, with 0/0 --> NaN, and equal_var=True.

Two things left to decide:
1) Do we want to transition to equal_var is False?
2) Do we want to unify the current 3 t-test function into one, like R/SAS?

My answer to 2) would be yes, which also allows to do 1) without generating
a deprecation warning. IMO this would simplify the API quite a bit, making
things more understandable also for non-statisticians. Comparing APIs, I
find ours quite poor:

R: ttest
Matlab: ttest, ttest2
SciPy: ttest_ind, ttest_1samp, ttest_rel

The signature of a combined function ttest() would still be simple:

def ttest(a, b=None, axis=0, popmean=0, equal_var=False)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20120610/fe900531/attachment.html 

More information about the SciPy-Dev mailing list