[Numpy-discussion] var and std

Aarre Laakso aarre at pair.com
Tue Nov 28 06:42:05 CST 2006


I was wondering if someone could explain the rationale for changing 
.var() and .std() in release 1.0b1 from normalizing by n-1 (unbiased 
estimate from sample) to normalizing by n (population)?

I have found the note that this change happened in the Release Notes
and the change itself in Changeset 2560
as well as a related documentation change in Ticket 388
but I have not been able to find a description of why the change was 
made, despite searching the website, the Trac and the mailing list.

I am aware of the argument that, if the difference between n and n-1 
matters to you, then you are "up to no good". On the other hand, this 
change breaks a lot of my unit tests. It also seems to violate the 
principle of least surprise: every other numerical environment that I 
have used divides by n-1 by default. Examples include MATLAB:
and R:

It also seems to present an inconsistent interface: cov() still 
normalizes by n-1 instead of n. It also has a 'bias' parameter that 
allows normalizing by n, which is similar to the compromises provided in 
the other numerical packages listed above. As an aside, cov() also does 
not seem to be provided as a method, only as a function.

In light of all that, I am sure there must have been a good reason for 
the change, and I am very curious what it was. Thanks for any insight 
you can offer.


Aarre Laakso

More information about the Numpy-discussion mailing list