[Numpy-discussion] non-standard standard deviation

Bruce Southey bsouthey@gmail....
Fri Dec 4 09:54:51 CST 2009


On 12/04/2009 06:18 AM, yogesh karpate wrote:
> @ Pauli and @ Colin:
>                                   Sorry for the late reply. I was busy 
> in some other assignments.
> # As far as  normalization by(n) is concerned then its common 
> assumption that the population is normally distributed and population 
> size is fairly large enough to fit the normal distribution. But this 
> standard deviation, when applied to a small population, tends to be 
> too low therefore it is called  as biased.
> # The correction known as bessel correction is there for small sample 
> size std. deviation. i.e. normalization by (n-1).
> # In "electrical-and-electronic-measurements-and-instrumentation" by 
> A.K. Sawhney . In 1st chapter of the book "Fundamentals of 
> Meausrements " . Its shown that for N=16 the std. deviation 
> normalization was (n-1)=15
> # While I was learning statistics in my course Instructor would advise 
> to take n=20 for normalization by (n-1)
> # Probability and statistics by Schuam Series  is good reading.
> Regards
> ~ymk
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>    
Hi,
Basically, all that I see with these arbitrary values is that you are 
relying on the 'central limit theorem' 
(http://en.wikipedia.org/wiki/Central_limit_theorem).  Really the issue 
in using these values is how much statistical bias will you tolerate 
especially in the impact on usage of that estimate because the usage of 
variance (such as in statistical tests) tend to be more influenced by 
bias than the estimate of variance. (Of course, many features rely on 
asymptotic properties so bias concerns are less apparent in large sample 
sizes.)

Obviously the default relies on the developers background and 
requirements. There are multiple valid variance estimators in statistics 
with different denominators like N (maximum likelihood estimator), N-1 
(restricted maximum likelihood estimator and certain Bayesian 
estimators) and Stein's 
(http://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator). So 
thecurrent default behavior is a valid and documented. Consequently you 
can not just have one option or different functions (like certain 
programs) and Numpy's implementation actually allows you do all these in 
a single function. So I also see no reason change even if I have to add 
the ddof=1 argument, after all 'Explicit is better than implicit' :-).

Bruce





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20091204/7d4f890c/attachment-0001.html 


More information about the NumPy-Discussion mailing list