[Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

David Cournapeau david at ar.media.kyoto-u.ac.jp
Thu Oct 5 01:08:33 CDT 2006


Bruce Southey wrote:
> Hi,
> I think what you are after is the multivariate normal distribution.
>   
Indeed
> Assuming I have it correctly, it is clearer to see (and probably more
> accurate to compute) in the log form as:
>
> -(N/2)*log(2*PI) - 0.5*log(determinant of V) - 0.5*(transpose of
> (x-mu))*inverse(V)*(x-mu)
>
> where N is the number of observations, PI is math constant, V is the
> known variance co-variance matrix, x is vector of values,  mu is the
> known mean.
>   
Sure, but I need the exponential form at the end (actually, in the real 
code, you can choose between log form and 'standard' form, but I removed 
this to make the example as simple as possible).
> If so, then you can vectorize the density calculation. 
I am not sure to understand what you mean: the computation is already 
vectorized in _diag_gauss_den; there is no loop there, and the function 
expects x to be of shape (n, d), where d is the dimension and n the 
number of samples. The loop in multiple_gaussian in not on samples, but 
on densities, that is I need to compute the normal multivariate 
densities on the same data but with different (vector) means and 
(diagonal ) variances, for which I don't see any easy way to vectorize 
without huge memory usage (using rank 3 arrays).

Anyway, the problem I try to understand here is not related to gaussian 
kernel computation, but rather on cost difference for accessing row and 
columns depending on the underlying storage (C or Fortran). Don't try to 
find flaws in _diag_gauss_den, as it is just a toy example to make my 
point clearer.

David




More information about the Numpy-discussion mailing list