[SciPy-dev] Statistics Review progress

Robert Kern robert.kern at gmail.com
Wed Apr 12 11:48:21 CDT 2006

```Alan G Isaac wrote:
> On Wed, 12 Apr 2006, Ed Schofield apparently wrote:
>
>>I can't think of anything better than 'biased'.  'Sample'
>>would be ambiguous, as Zachary mentioned.  Using
>>'unbiased' in the names would be incorrect for std, as
>>Zachary also mentioned
>
> 'central'?
> http://mathworld.wolfram.com/SampleCentralMoment.html

No, that's another concept entirely. The nth moment of a distribution is defined
around 0:

integrate(lambda x: (x-0)**n*pdf(x), -inf, inf)

The nth central moment of a distribution is defined around the mean of the
distribution:

integrate(lambda x: (x-mean)**n*pdf(x), -inf, inf)

One can devise procedures ("estimators") to estimate these and other quantities
from actual data putatively sampled from these distributions. These estimators
are called "unbiased" when the distribution of the *estimated quantity*
(assuming we were to draw same-sized datasets from the same underlying
distribution many times and calculate estimates from each of them separately)
has a mean equal to the actual quantity.

You could also define estimators that have other features, for example, picking
out the estimate that would provide the maximum likelihood of actually getting
the data in front of you. Maximum likelihood estimators are generally "biased."
Honestly, I think this is a good thing, but many people don't. For most of the
estimators that we are talking about here, the only difference between the two
is a coefficient near 1 (and gets closer to 1 as the size of the sample
increases). For example, the maximum likelihood estimator for (central!)
variance is ((x-x.mean())**2).sum()/len(x). The "unbiased" estimator is
((x-x.mean())**2)/(len(x)-1).

--
Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma