[Numpy-discussion] Cross-covariance function
Pierre Haessig
pierre.haessig@crans....
Wed Feb 1 10:47:42 CST 2012
Hi Bruce,
Sorry for the delay in the answer.
Le 27/01/2012 17:28, Bruce Southey a écrit :
> The output is still a covariance so do we really need yet another set
> of very similar functions to maintain?
> Or can we get away with a new keyword?
>
The idea of an additional keyword seems appealing.
Just to make sure I understood it well, you woud be proposing a new
signature like :
def cov(.... get_full_cov_matrix=True)
and when `get_full_cov_matrix` is set to False, only the cross
covariance part would be returned.
Am I right ?
> If speed really matters to you guys then surely moving np.cov into C
> would have more impact on 'saving the world' than this proposal. That
> also ignores algorithm used (
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance).
>
>
I didn't get your point about the algorithm here. From this
nomenclature, I would say that numpy.cov is based on a vectorized
"two-pass algorithm" which computes the means first and then substracts
it before computing the matrix product. Would you make it different ?
> Actually np.cov also is deficient in that it does not have the dtype
> argument so it is prone to numerical precision errors (especially
> getting the mean of the array). Probably should be a ticket...
I'm not a specialist of numerical precisions, but I got very impressed
by the recent example raised on Jan 24th by Michael Aye which was one of
the first "real life" example I've seen.
The way I see the cov algorithm, I see first a possibility to propagate
an optional dtype argument to the mean computation.
However, I'm unsure about what to do after, for the matrix product since
"dot(X.T, X.conj()) / fact" is also a sort of mean computation.
Therefore it can also be affected by numerical precision issue. What
would you suggest ?
(the only solution I see would be to use the running variance algorithm.
Since the code wouldn't be vectorized anymore, this indeed would
benefits from going to C)
Best,
Pierre
More information about the NumPy-Discussion
mailing list