[SciPy-user] stats review: std/var and samplestd/samplevar

Matthew Vernon matthew at sel.cam.ac.uk
Mon Apr 3 05:27:21 CDT 2006


I think the original poster meant (N-1) some of the time when they  
said (1-N).

> I would propose to have:
> (1) scipy.stats.var and scipy.stats.std -- use N as the denominator
> (2) scipy.stats.samplevar and scipy.stats.samplesdt -- at least use
> n-1 as the denominator. Better would be to deprecate / remove them
> because as above "sample variance" is ambiguous.
> (3) scipy.stats.var_unbiased -- use n-1 as denominator. As per the
> note below, there is no general unbiased estimator of the standard
> deviation, and so there should be no scipy.stats.std_unbiased
> function. (See the wikipedia entry and also http://www.itl.nist.gov/
> div898/handbook/pmc/section3/pmc32.htm )

> I feel vaguely that the N-1 estimator is always problematic, because
> if you have a small enough sample that it makes a difference, you've
> got bigger problems than using N or N-1. Not that these problems are
> insurmountable, but you've got to have some statistical savvy to deal
> properly with them. As such, I think that the default functions (var
> and std) should just return the population statistics. But reasonable
> people may disagree.

Whilst you might argue that N vs N-1 isn't going to make much of a  
difference on a large sample, I am still strongly of the opinion that  
it should be an option.

why not simply have scipy.stats.var (and std) with an option for  
whether you want N or N-1?


Matthew Vernon MA VetMB LGSM MRCVS
Farm Animal Epidemiology and Informatics Unit
Department of Veterinary Medicine, University of Cambridge

More information about the SciPy-user mailing list