[Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)
David Cournapeau
david at ar.media.kyoto-u.ac.jp
Thu Oct 5 01:08:33 CDT 2006
Bruce Southey wrote:
> Hi,
> I think what you are after is the multivariate normal distribution.
>
Indeed
> Assuming I have it correctly, it is clearer to see (and probably more
> accurate to compute) in the log form as:
>
> -(N/2)*log(2*PI) - 0.5*log(determinant of V) - 0.5*(transpose of
> (x-mu))*inverse(V)*(x-mu)
>
> where N is the number of observations, PI is math constant, V is the
> known variance co-variance matrix, x is vector of values, mu is the
> known mean.
>
Sure, but I need the exponential form at the end (actually, in the real
code, you can choose between log form and 'standard' form, but I removed
this to make the example as simple as possible).
> If so, then you can vectorize the density calculation.
I am not sure to understand what you mean: the computation is already
vectorized in _diag_gauss_den; there is no loop there, and the function
expects x to be of shape (n, d), where d is the dimension and n the
number of samples. The loop in multiple_gaussian in not on samples, but
on densities, that is I need to compute the normal multivariate
densities on the same data but with different (vector) means and
(diagonal ) variances, for which I don't see any easy way to vectorize
without huge memory usage (using rank 3 arrays).
Anyway, the problem I try to understand here is not related to gaussian
kernel computation, but rather on cost difference for accessing row and
columns depending on the underlying storage (C or Fortran). Don't try to
find flaws in _diag_gauss_den, as it is just a toy example to make my
point clearer.
David
More information about the Numpy-discussion
mailing list