[Numpy-discussion] var bias reason?
Wed Oct 15 11:26:18 CDT 2008
While I disagree, I really do not care because this is documented. But
perhaps a clear warning is need at the start so it clear what the
default ddof means instead of it being buried in the Notes section.
Also I am surprised that you did not directly reference the Stein
estimator (your minimum mean-squared estimator) and known effects in
So I did not find thiss any different from what is already known about
the Stein estimator.
PS While I may have gotten access via my University, I did get it from
the link *Access this item.
Travis E. Oliphant wrote:
> Gabriel Gellner wrote:
>> Some colleagues noticed that var uses biased formula's by default in numpy,
>> searching for the reason only brought up:
>> which I totally agree with, but there was no response? Any reason for this?
> I will try to respond to this as it was me who made the change. I think
> there have been responses, but I think I've preferred to stay quiet
> rather than feed a flame war. Ultimately, it is a matter of preference
> and I don't think there would be equal weights given to all the
> arguments surrounding the decision by everybody.
> I will attempt to articulate my reasons: dividing by n is the maximum
> likelihood estimator of variance and I prefer that justification more
> than the "un-biased" justification for a default (especially given that
> bias is just one part of the "error" in an estimator). Having every
> package that computes the mean return the "un-biased" estimate gives it
> more cultural weight than than the concept deserves, I think. Any
> surprise that is created by the different default should be mitigated by
> the fact that it's an opportunity to learn something about what you are
> doing. Here is a paper I wrote on the subject that you might find
> (Hopefully, they will resolve a link problem at the above site soon, but
> you can read the abstract).
> I'm not trying to persuade anybody with this email (although if you can
> download the paper at the above link, then I am trying to persuade with
> that). In this email I'm just trying to give context to the poster as I
> think the question is legitimate.
> With that said, there is the ddof parameter so that you can change what
> the divisor is. I think that is a useful compromise.
> I'm unhappy with the internal inconsistency of cov, as I think it was an
> oversight. I'd be happy to see cov changed as well to use the ddof
> argument instead of the bias keyword, but that is an API change and
> requires some transition discussion and work.
> The only other argument I've heard against the current situation is
> "unit testing" with MATLAB or R code. Just use ddof=1 when comparing
> against MATLAB and R code is my suggestion.
> Best regards,
> Numpy-discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion