# [Numpy-discussion] speeding up y[:, i] for y.shape = (big number, small number)

David Cournapeau david at ar.media.kyoto-u.ac.jp
Wed Oct 4 05:58:25 CDT 2006

Tim Hochberg wrote:
>>
>>
>> I guess the problem is coming from the fact that y being C order, y[:,
>> i] needs accessing data in a non 'linear' way. Is there a way to speed
>> this up ? I did something like this:
>>
>>   y   = N.zeros((K, n))
>>     for i in range(K):
>>         y[i] = gauss_den(data, mu[i, :], va[i, :])
>>     return y.T
>>
>> which works, but I don't like it very much.
> Why not?
>
Mainly because using those transpose do not really reflect the
intention, and this does not seem natural.
>>  Isn't there any other way
> That depends on the details of gauss_den.
>
> A general note: for this kind of microoptimization puzzle, it's much
> easier to help if you can post a self contained example, preferably
> something fairly simple that still illustrates the speed issue, that we
> can experiment with.
>
Here we are (the difference may not seem that much between the two
multiple_ga, but in reality, _diag_gauss_den is an internal function
which is done in C, and is much faster... By writing this example, I've
just realized that the function _diag_gauss_den may be slow for exactly
the same reasons):

#! /usr/bin/env python
# Last Change: Wed Oct 04 07:00 PM 2006 J

import numpy as N
from numpy.random import randn

def _diag_gauss_den(x, mu, va):
""" This function is the actual implementation
of gaussian pdf in scalar case. It assumes all args
are conformant, so it should not be used directly

# Diagonal matrix case
d       = mu.size
inva    = 1/va[0]
fac     = (2*N.pi) ** (-d/2.0) * N.sqrt(inva)
y       =  (x[:,0] - mu[0]) ** 2 * inva * -0.5
for i in range(1, d):
inva    = 1/va[i]
fac     *= N.sqrt(inva)
y       += (x[:,i] - mu[i]) ** 2 * inva * -0.5
y   = fac * N.exp(y)

def multiple_gauss_den1(data, mu, va):
"""Helper function to generate several Gaussian
pdf (different parameters) from the same data: unoptimized version"""
K   = mu.shape[0]
n   = data.shape[0]
d   = data.shape[1]

y   = N.zeros((n, K))
for i in range(K):
y[:, i] = _diag_gauss_den(data, mu[i, :], va[i, :])

return y

def multiple_gauss_den2(data, mu, va):
"""Helper function to generate several Gaussian
pdf (different parameters) from the same data: optimized version"""
K   = mu.shape[0]
n   = data.shape[0]
d   = data.shape[1]

y   = N.zeros((K, n))
for i in range(K):
y[i] = _diag_gauss_den(data, mu[i, :], va[i, :])

return y.T

def bench():
#===========================================
# GMM of 30 comp, 15 dimension, 1e4 frames
#===========================================
d       = 15
k       = 30
nframes = 1e4
niter   = 10
mode    = 'diag'

mu      = randn(k, d)
va      = randn(k, d) ** 2
X       = randn(nframes, d)

print "============================================================="
print "(%d dim, %d components) GMM with %d iterations, for %d frames" \
% (d, k, niter, nframes)

for i in range(niter):
y1  = multiple_gauss_den1(X, mu, va)

for i in range(niter):
y2  = multiple_gauss_den2(X, mu, va)

se  = N.sum(y1-y2)

print se

if __name__ == '__main__':
import hotshot, hotshot.stats
profile_file    = 'foo.prof'
prof    = hotshot.Profile(profile_file, lineevents=1)
prof.runcall(bench)
print p.sort_stats('cumulative').print_stats(20)
prof.close()

I am a bit puzzled by all those C vs F storage, though. In Matlab, where
the storage was always F as far as I know, I have never encountered such
differences (eg between y(:, i) and y(:, i)); I don't know if this is
because I am doing it badly, or because matlab is much more clever than
numpy at handling those cases, or if that is the price to pay for the