[Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy)

per freem perfreem@gmail....
Sat Oct 17 10:36:26 CDT 2009


hi all,

in my code, i use the function 'logsumexp' from scipy.maxentropy a
lot. as far as i can tell, this function has no vectorized version
that works on an m-x-n matrix. i might be doing something wrong here,
but i found that this function can run extremely slowly if used as
follows: i have an array of log probability vectors, such that each
column sums to one. i want to simply iterate over each column and
renormalize it, using exp(col - logsumexp(col)). here is the code that
i used to profile this operation:

from scipy import *
from numpy import *
from numpy.random.mtrand import dirichlet
from scipy.maxentropy import logsumexp
import time

# build an array of probability vectors.  each column represents a
probability vector.
num_vectors = 1000000
log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors)))
# now renormalize each column, using logsumexp
norm_prob_vectors = []
t1 = time.time()
for n in range(num_vectors):
    norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n]))
    norm_prob_vectors.append(norm_p)
t2 = time.time()
norm_prob_vectors = array(norm_prob_vectors)
print "logsumexp renormalization (%d many times) took %s seconds."
%(num_vectors, str(t2-t1))

i found that even with only 100,000 elements, this code takes about 5 seconds:

logsumexp renormalization (100000 many times) took 5.07085394859 seconds.

with 1 million elements, it becomes prohibitively slow:

logsumexp renormalization (1000000 many times) took 70.7815010548 seconds.

is there a way to speed this up? most vectorized operations that work
on matrices in numpy/scipy are incredibly fast and it seems like a
vectorized version of logsumexp should be near instant on this scale.
is there a way to rewrite the above snippet so that it's faster?

thanks very much for your help.


More information about the NumPy-Discussion mailing list