[Numpy-tickets] [NumPy] #559: var() on a size=1 array should raise exception.
NumPy
numpy-tickets@scipy....
Mon Jul 30 09:25:01 CDT 2007
#559: var() on a size=1 array should raise exception.
--------------------+-------------------------------------------------------
Reporter: gpk | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone:
Component: Other | Version: none
Severity: normal | Keywords:
--------------------+-------------------------------------------------------
Properly, a single value does not have a variance.
Whoa!, you might say: "I can compute the
sum of the squares of the distance from the mean,
even when there is only one value. It's zero."
That's true, but misleading.
Since you call it "var", you are implying that
it is a variance, and variance is undefined unless
you have two or more values.
More importantly, even the biassed estimator of
variance ( sum((x-xbar)^2)/N ) does not have a
well-defined value when N==1. All estimates of
the variance are equally good or equally bad when
you have only one value. Follow the logic to
compute the minimum variance estimator of variance
for a single sample, and you'll see all kinds of
absurdities, including characteristic functions
for the probability distribution that do not have a limiting value.
From a practical point of view, anyone who takes the variance
of a single value is probably in deep trouble. Variances
tend to get used for F-tests, which are undefined for zero
degrees of freedom. The get used to build confidence intervals
for t-tests, but the t-distribution is undefined for zero degrees
of freedom. Some people will construct confidence intervals
via [mean-3*sqrt(var), mean+3*sqrt(var)] or some similar
approximation: this will fail badly and lead to tears.
And, the same logic applies to the std(), of course.
{{{
>>> import numpy
>>> x = numpy.array([1.0])
>>> x
array([ 1.])
>>> x.var()
0.0
>>>
}}}
--
Ticket URL: <http://projects.scipy.org/scipy/numpy/ticket/559>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.
More information about the Numpy-tickets
mailing list