[SciPy-User] "small data" statistics

Emanuele Olivetti emanuele@relativita....
Fri Oct 12 10:27:14 CDT 2012


On 10/12/2012 01:22 PM, Nathaniel Smith wrote:
>
> On 12 Oct 2012 09:37, "Emanuele Olivetti" <emanuele@relativita.com 
> <mailto:emanuele@relativita.com>> wrote:
>
> > IMHO the question in the example at that URL, i.e. "Did the instructions
> > given to the participants significantly affect their level of recall?" is
> > not directly addressed by the permutation test.
>
> In this sentence, the word "significantly" is a term of art used to refer exactly to the 
> quantity p(t>T(data)|H_0). So, yes, the permutation test addresses the original 
> question; you just have to be familiar with the field's particular jargon to understand 
> what they're saying. :-)
>

Thanks Nathaniel for pointing that out. I guess I'll hardly be much familiar with
such a jargon ;-). Nevertheless while reading the example I believed
that the aim of the thought experiment was to decide among two competing
theories/hypothesis, given the results of the experiment.
But I share your point that the term "significant" turns it into a different question.

> All tests require some kind of representativeness, and this isn't really a problem. The 
> data are by definition representative (in the technical sense) of the distribution they 
> were drawn from. (The trouble comes when you want to decide whether that distribution 
> matches anything you care about, but looking at the data won't tell you that.) A well 
> designed test is one that is correct on average across samples.
>

Indeed my wording was imprecise so thanks once more for correcting
it. Moreover you put it really well: "The trouble comes when you want to
decide whether that distribution matches anything you care about, but
looking at the data won't tell you that".
Could you tell more about evaluating the correctness of a test across
different samples? It sounds interesting.

> The alternative to a permutation test here is to make very strong assumptions about the 
> underlying distributions (e.g. with a t test), and these assumptions are often justified 
> only for large samples.  And, resampling tests are computationally expensive, but this 
> is no problem for small samples. So that's why non parametrics are often better in this 
> setting.
>
>

I agree with you that strong assumptions about the underlying distributions,
e.g. parametric modeling, may raise big practical concerns. The only pro
is that at least you know the assumptions explicitly.

Best,

Emanuele

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20121012/04ef7cdd/attachment.html 


More information about the SciPy-User mailing list