[Numpy-discussion] def of var of complex
Robert Kern
robert.kern@gmail....
Tue Jan 8 19:54:07 CST 2008
Neal Becker wrote:
> I noticed that if I generate complex rv i.i.d. with var=1, that numpy says:
>
> var (<real part>) -> (close to 1.0)
> var (<imag part>) -> (close to 1.0)
>
> but
>
> var (complex array) -> (close to complex 0)
>
> Is that not a strange definition?
There is some discussion on this in the tracker.
http://projects.scipy.org/scipy/numpy/ticket/638
The current state of affairs is that the implementation of var() just naively
applies the standard formula for real numbers.
mean((x - mean(x)) ** 2)
I think this is pretty obviously wrong prima facie. AFAIK, no one considers this
a valid definition of variance for complex RVs or in fact a useful value. I
think we should change this. Unfortunately, there is no single alternative but
several.
1. Punt. Complex numbers are inherently multidimensional, and a single scale
parameter doesn't really describe most distributions of complex numbers.
Instead, you need a real covariance matrix which you can get with cov([z.real,
z.imag]). This estimates the covariance matrix of a 2-D Gaussian distribution
over RR^2 (interpreted as CC).
2. Take a slightly less naive formula for the variance which seems to show up in
some texts:
mean(absolute(z - mean(z)) ** 2)
This estimates the single parameter of a circular Gaussian over RR^2
(interpreted as CC). It is also the trace of the covariance matrix above.
3. Take the variances of the real and imaginary components independently. This
is equivalent to taking the diagonal of the covariance matrix above. This
wouldn't be the definition of "*the* complex variance" that anyone else uses,
but rather another form of punting. "There isn't a single complex variance to
give you, but in the spirit of broadcasting, we'll compute the marginal
variances of each dimension independently."
Personally, I like 1 a lot. I'm hesitant to support 2 until I've seen an actual
application of that definition. The references I have been given in the ticket
comments are all early parts of books where the authors are laying out
definitions without applications. Personally, it feels to me like the authors
are just sticking in the absolute()'s ex post facto just so they can extend the
definition they already have to complex numbers. I'm also not a fan of the
expectation-centric treatments of random variables. IMO, the variance of an
arbitrary RV isn't an especially important quantity. It's a parameter of a
Gaussian distribution, and in this case, I see no reason to favor circular
Gaussians in CC over general ones.
But if someone shows me an actual application of the definition, I can amend my
view.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Numpy-discussion
mailing list