[Numpy-discussion] Cross-covariance function

Pierre Haessig pierre.haessig@crans....
Wed Feb 1 10:47:42 CST 2012

Hi Bruce,
Sorry for the delay in the answer.

Le 27/01/2012 17:28, Bruce Southey a écrit :
> The output is still a covariance so do we really need yet another set 
> of very similar functions to maintain?
> Or can we get away with a new keyword?
The idea of an additional keyword seems appealing.
Just to make sure I understood it well, you woud be proposing a new 
signature like :
def cov(.... get_full_cov_matrix=True)

and when `get_full_cov_matrix` is set to False, only the cross 
covariance part would be returned.
Am I right ?
> If speed really matters to you guys then surely moving np.cov into C 
> would have more impact on 'saving the world' than this proposal. That 
> also ignores algorithm used ( 
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance). 
I didn't get your point about the algorithm here. From this 
nomenclature, I would say that numpy.cov is based on a vectorized 
"two-pass algorithm" which computes the means first and then substracts 
it before computing the matrix product. Would you make it different ?

> Actually np.cov also is deficient in that it does not have the dtype 
> argument so it is prone to numerical precision errors (especially 
> getting the mean of the array). Probably should be a ticket...
I'm not a specialist of numerical precisions, but I got very impressed 
by the recent example raised on Jan 24th by Michael Aye which was one of 
the first "real life" example I've seen.

The way I see the cov algorithm, I see first a possibility to propagate 
an optional dtype argument to the mean computation.
However, I'm unsure about what to do after, for the matrix product since 
"dot(X.T, X.conj()) / fact" is also a sort of mean computation. 
Therefore it can also be affected by numerical precision issue. What 
would you suggest ?

(the only solution I see would be to use the running variance algorithm. 
Since the code wouldn't be vectorized anymore, this indeed would 
benefits from going to C)


More information about the NumPy-Discussion mailing list