[Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy)
Sat Oct 17 10:48:37 CDT 2009
On Sat, Oct 17, 2009 at 8:36 AM, per freem <firstname.lastname@example.org> wrote:
> hi all,
> in my code, i use the function 'logsumexp' from scipy.maxentropy a
> lot. as far as i can tell, this function has no vectorized version
> that works on an m-x-n matrix. i might be doing something wrong here,
> but i found that this function can run extremely slowly if used as
> follows: i have an array of log probability vectors, such that each
> column sums to one. i want to simply iterate over each column and
> renormalize it, using exp(col - logsumexp(col)). here is the code that
> i used to profile this operation:
> from scipy import *
> from numpy import *
> from numpy.random.mtrand import dirichlet
> from scipy.maxentropy import logsumexp
> import time
> # build an array of probability vectors. each column represents a
> probability vector.
> num_vectors = 1000000
> log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors)))
> # now renormalize each column, using logsumexp
> norm_prob_vectors = 
> t1 = time.time()
> for n in range(num_vectors):
> norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n]))
> t2 = time.time()
> norm_prob_vectors = array(norm_prob_vectors)
> print "logsumexp renormalization (%d many times) took %s seconds."
> %(num_vectors, str(t2-t1))
> i found that even with only 100,000 elements, this code takes about 5 seconds:
> logsumexp renormalization (100000 many times) took 5.07085394859 seconds.
> with 1 million elements, it becomes prohibitively slow:
> logsumexp renormalization (1000000 many times) took 70.7815010548 seconds.
> is there a way to speed this up? most vectorized operations that work
> on matrices in numpy/scipy are incredibly fast and it seems like a
> vectorized version of logsumexp should be near instant on this scale.
> is there a way to rewrite the above snippet so that it's faster?
> thanks very much for your help.
Here's logsumexp from scipy:
a = asarray(a)
a_max = a.max()
return a_max + log((exp(a-a_max)).sum())
Would this work:
a = asarray(a)
a_max = a.max(axis=0)
return a_max + log((exp(a-a_max)).sum(axis=0))
More information about the NumPy-Discussion