[Numpy-discussion] non-standard standard deviation
Fri Dec 4 09:54:51 CST 2009
On 12/04/2009 06:18 AM, yogesh karpate wrote:
> @ Pauli and @ Colin:
> Sorry for the late reply. I was busy
> in some other assignments.
> # As far as normalization by(n) is concerned then its common
> assumption that the population is normally distributed and population
> size is fairly large enough to fit the normal distribution. But this
> standard deviation, when applied to a small population, tends to be
> too low therefore it is called as biased.
> # The correction known as bessel correction is there for small sample
> size std. deviation. i.e. normalization by (n-1).
> # In "electrical-and-electronic-measurements-and-instrumentation" by
> A.K. Sawhney . In 1st chapter of the book "Fundamentals of
> Meausrements " . Its shown that for N=16 the std. deviation
> normalization was (n-1)=15
> # While I was learning statistics in my course Instructor would advise
> to take n=20 for normalization by (n-1)
> # Probability and statistics by Schuam Series is good reading.
> NumPy-Discussion mailing list
Basically, all that I see with these arbitrary values is that you are
relying on the 'central limit theorem'
(http://en.wikipedia.org/wiki/Central_limit_theorem). Really the issue
in using these values is how much statistical bias will you tolerate
especially in the impact on usage of that estimate because the usage of
variance (such as in statistical tests) tend to be more influenced by
bias than the estimate of variance. (Of course, many features rely on
asymptotic properties so bias concerns are less apparent in large sample
Obviously the default relies on the developers background and
requirements. There are multiple valid variance estimators in statistics
with different denominators like N (maximum likelihood estimator), N-1
(restricted maximum likelihood estimator and certain Bayesian
estimators) and Stein's
thecurrent default behavior is a valid and documented. Consequently you
can not just have one option or different functions (like certain
programs) and Numpy's implementation actually allows you do all these in
a single function. So I also see no reason change even if I have to add
the ddof=1 argument, after all 'Explicit is better than implicit' :-).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion