# [Numpy-discussion] non-standard standard deviation

josef.pktd@gmai... josef.pktd@gmai...
Sun Dec 6 10:21:09 CST 2009

```On Sun, Dec 6, 2009 at 11:01 AM, Colin J. Williams <cjw@ncf.ca> wrote:
>
>
> On 04-Dec-09 10:54 AM, Bruce Southey wrote:
>> On 12/04/2009 06:18 AM, yogesh karpate wrote:
>>> @ Pauli and @ Colin:
>>>                                   Sorry for the late reply. I was
>>> busy in some other assignments.
>>> # As far as  normalization by(n) is concerned then its common
>>> assumption that the population is normally distributed and population
>>> size is fairly large enough to fit the normal distribution. But this
>>> standard deviation, when applied to a small population, tends to be
>>> too low therefore it is called  as biased.
>>> # The correction known as bessel correction is there for small sample
>>> size std. deviation. i.e. normalization by (n-1).
>>> # In "electrical-and-electronic-measurements-and-instrumentation" by
>>> A.K. Sawhney . In 1st chapter of the book "Fundamentals of
>>> Meausrements " . Its shown that for N=16 the std. deviation
>>> normalization was (n-1)=15
>>> # While I was learning statistics in my course Instructor would
>>> advise to take n=20 for normalization by (n-1)
>>> # Probability and statistics by Schuam Series  is good reading.
>>> Regards
>>> ~ymk
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>> Hi,
>> Basically, all that I see with these arbitrary values is that you are
>> relying on the 'central limit theorem'
>> (http://en.wikipedia.org/wiki/Central_limit_theorem).  Really the
>> issue in using these values is how much statistical bias will you
>> tolerate especially in the impact on usage of that estimate because
>> the usage of variance (such as in statistical tests) tend to be more
>> influenced by bias than the estimate of variance. (Of course, many
>> features rely on asymptotic properties so bias concerns are less
>> apparent in large sample sizes.)
>>
>> Obviously the default relies on the developers background and
>> requirements. There are multiple valid variance estimators in
>> statistics with different denominators like N (maximum likelihood
>> estimator), N-1 (restricted maximum likelihood estimator and certain
>> Bayesian estimators) and Stein's
>> (http://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator). So
>> thecurrent default behavior is a valid and documented. Consequently
>> you can not just have one option or different functions (like certain
>> programs) and Numpy's implementation actually allows you do all these
>> in a single function. So I also see no reason change even if I have to
>> add the ddof=1 argument, after all 'Explicit is better than implicit' :-).
>>
>> Bruce
> Bruce,
>
> I suggest that the Central Limit Theorem is tied in with the Law of
> Large Numbers.
>
> When one has a smallish sample size, what give the best estimate of the
> variance?  The Bessel Correction provides a rationale, based on
> expectations: (http://en.wikipedia.org/wiki/Bessel%27s_correction).
>
> It is difficult to understand the proof of Stein:
> http://en.wikipedia.org/wiki/Proof_of_Stein%27s_example
>
> The symbols used are not clearly stated.  He seems interested in a
> decision rule for the calculation of the mean of a sample and claims
> that his approach is better than the traditional Least Squares approach.
>
> In most cases, the interest is likely to be in the variance, with a view
> to establishing a confidence interval.

What's the best estimate? That's the main question

Estimators differ in their (sample or posterior) distribution,
especially bias and variance.
Stein estimator dominates OLS in the mean squared error, so although
it is biased, the variance of the estimator is smaller than OLS so that
MSE (bias plus variance) is also smaller for Stein estimator than for OLS.
Depending on the application there could be many possible loss functions,
including asymmetric, eg. if its more costly to over than to under estimate.

The following was a good book for this, that I read a long time ago:
Statistical decision theory and Bayesian analysis By James O. Berger

>
> In the widely used Analysis of Variance (ANOVA), the degrees of freedom
> are reduced for each mean estimated, see:
> http://www.mnstate.edu/wasson/ed602lesson13.htm for the example below:
>
> *Analysis of Variance Table* ** Source of
> Variation       Sum of
> Squares         Degrees of
> Freedom         Mean
> Square  F Ratio         p
> Between Groups  25.20   2       12.60   5.178   <.05
> Within Groups   29.20   12      2.43
>
> Total   54.40   14
>
>
> There is a sample of 15 observations, which is divided into three
> groups, depending on the number of hours of therapy.
> Thus, the Total degrees of freedom are 15-1 = 14,  the Between Groups
> 3-1 = 2 and the Residual is 14 - 2 = 12.

Statistical tests are the only area where I really pay attention to the
degrees of freedom, since the test statistic is derived under specific
assumptions.
But there are also many cases, where different statisticians argue
in favor of different dof corrections, and it is not always clear in which
cases one or another is the "best".

Josef
>
> Colin W.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
```