[Numpy-discussion] Cross-covariance function

Pierre Haessig pierre.haessig@crans....
Wed Feb 1 10:47:42 CST 2012


Hi Bruce,
Sorry for the delay in the answer.

Le 27/01/2012 17:28, Bruce Southey a écrit :
> The output is still a covariance so do we really need yet another set 
> of very similar functions to maintain?
> Or can we get away with a new keyword?
>
The idea of an additional keyword seems appealing.
Just to make sure I understood it well, you woud be proposing a new 
signature like :
def cov(.... get_full_cov_matrix=True)

and when `get_full_cov_matrix` is set to False, only the cross 
covariance part would be returned.
Am I right ?
> If speed really matters to you guys then surely moving np.cov into C 
> would have more impact on 'saving the world' than this proposal. That 
> also ignores algorithm used ( 
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance). 
>
>
I didn't get your point about the algorithm here. From this 
nomenclature, I would say that numpy.cov is based on a vectorized 
"two-pass algorithm" which computes the means first and then substracts 
it before computing the matrix product. Would you make it different ?

> Actually np.cov also is deficient in that it does not have the dtype 
> argument so it is prone to numerical precision errors (especially 
> getting the mean of the array). Probably should be a ticket...
I'm not a specialist of numerical precisions, but I got very impressed 
by the recent example raised on Jan 24th by Michael Aye which was one of 
the first "real life" example I've seen.

The way I see the cov algorithm, I see first a possibility to propagate 
an optional dtype argument to the mean computation.
However, I'm unsure about what to do after, for the matrix product since 
"dot(X.T, X.conj()) / fact" is also a sort of mean computation. 
Therefore it can also be affected by numerical precision issue. What 
would you suggest ?

(the only solution I see would be to use the running variance algorithm. 
Since the code wouldn't be vectorized anymore, this indeed would 
benefits from going to C)

Best,
Pierre




More information about the NumPy-Discussion mailing list