[Numpy-discussion] var bias reason?
Wed Oct 15 11:09:03 CDT 2008
On Wed, Oct 15, 2008 at 09:45:39AM -0500, Travis E. Oliphant wrote:
> Gabriel Gellner wrote:
> > Some colleagues noticed that var uses biased formula's by default in numpy,
> > searching for the reason only brought up:
> > http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias
> > which I totally agree with, but there was no response? Any reason for this?
> I will try to respond to this as it was me who made the change. I think
> there have been responses, but I think I've preferred to stay quiet
> rather than feed a flame war. Ultimately, it is a matter of preference
> and I don't think there would be equal weights given to all the
> arguments surrounding the decision by everybody.
> I will attempt to articulate my reasons: dividing by n is the maximum
> likelihood estimator of variance and I prefer that justification more
> than the "un-biased" justification for a default (especially given that
> bias is just one part of the "error" in an estimator). Having every
> package that computes the mean return the "un-biased" estimate gives it
> more cultural weight than than the concept deserves, I think. Any
> surprise that is created by the different default should be mitigated by
> the fact that it's an opportunity to learn something about what you are
> doing. Here is a paper I wrote on the subject that you might find
> (Hopefully, they will resolve a link problem at the above site soon, but
> you can read the abstract).
Thanks for the reply, I look forward to reading the paper when it is
available. The major issue in my mind is not the technical issue but the
surprise factor. I can't think of single other package that uses this as the
default, and since it is also a method of ndarray (which is a built in type
and can't be monkey patched) there is no way of taking a different view (that
is supplying my on function) without the confusion I am feeling in my own lab
. . . (less technical people need to understand that they shouldn't
use a method of the same name)
I worry about having numpy take this unpopular stance (as far as packages go)
simply to fight the good fight, as a built in method/behaviour of any ndarray,
rather than as a built in function, which presents no such problem, as it
allows dissent over a clearly muddy issue.
Sorry for the noise, and I am happy to see their is a reason, but I can't help
but find this a wort for purely pedagogical reasons.
More information about the Numpy-discussion