[Numpy-discussion] var and std
aarre at pair.com
Tue Nov 28 06:42:05 CST 2006
I was wondering if someone could explain the rationale for changing
.var() and .std() in release 1.0b1 from normalizing by n-1 (unbiased
estimate from sample) to normalizing by n (population)?
I have found the note that this change happened in the Release Notes
and the change itself in Changeset 2560
as well as a related documentation change in Ticket 388
but I have not been able to find a description of why the change was
made, despite searching the website, the Trac and the mailing list.
I am aware of the argument that, if the difference between n and n-1
matters to you, then you are "up to no good". On the other hand, this
change breaks a lot of my unit tests. It also seems to violate the
principle of least surprise: every other numerical environment that I
have used divides by n-1 by default. Examples include MATLAB:
It also seems to present an inconsistent interface: cov() still
normalizes by n-1 instead of n. It also has a 'bias' parameter that
allows normalizing by n, which is similar to the compromises provided in
the other numerical packages listed above. As an aside, cov() also does
not seem to be provided as a method, only as a function.
In light of all that, I am sure there must have been a good reason for
the change, and I am very curious what it was. Thanks for any insight
you can offer.
More information about the Numpy-discussion