[Numpy-discussion] var bias reason?

David Cournapeau cournape@gmail....
Wed Oct 15 10:19:54 CDT 2008

On Wed, Oct 15, 2008 at 11:45 PM, Travis E. Oliphant
<oliphant@enthought.com> wrote:
> Gabriel Gellner wrote:
>> Some colleagues noticed that var uses biased formula's by default in numpy,
>> searching for the reason only brought up:
>> http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias
>> which I totally agree with, but there was no response? Any reason for this?
> I will try to respond to this as it was me who made the change.  I think
> there have been responses, but I think I've preferred to stay quiet
> rather than feed a flame war.   Ultimately, it is a matter of preference
> and I don't think there would be equal weights given to all the
> arguments surrounding the decision by everybody.
> I will attempt to articulate my reasons:  dividing by n is the maximum
> likelihood estimator of variance and I prefer that justification more
> than the "un-biased" justification for a default (especially given that
> bias is just one part of the "error" in an estimator).    Having every
> package that computes the mean return the "un-biased" estimate gives it
> more cultural weight than than the concept deserves, I think.  Any
> surprise that is created by the different default should be mitigated by
> the fact that it's an opportunity to learn something about what you are
> doing.    Here is a paper I wrote on the subject that you might find
> useful:
> https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EER&CISOPTR=134&CISOBOX=1&REC=1
> (Hopefully, they will resolve a link problem at the above site soon, but
> you can read the abstract).

Yes, I hope too, I would be happy to read the article.

On the limit of unbiasdness, the following document mentions an
example (in a different context than variance estimation):


AFAIK, even statisticians who consider themselves as "mostly
frequentist" (if that makes any sense) do not advocate unbiasdness as
such an important concept anymore (Larry Wasserman mentions it in his
"all of statistics").



More information about the Numpy-discussion mailing list