[Numpy-discussion] vectorized version of logsumexp? (from scipy.maxentropy)

Keith Goodman kwgoodman@gmail....
Sat Oct 17 10:48:37 CDT 2009


On Sat, Oct 17, 2009 at 8:36 AM, per freem <perfreem@gmail.com> wrote:
> hi all,
>
> in my code, i use the function 'logsumexp' from scipy.maxentropy a
> lot. as far as i can tell, this function has no vectorized version
> that works on an m-x-n matrix. i might be doing something wrong here,
> but i found that this function can run extremely slowly if used as
> follows: i have an array of log probability vectors, such that each
> column sums to one. i want to simply iterate over each column and
> renormalize it, using exp(col - logsumexp(col)). here is the code that
> i used to profile this operation:
>
> from scipy import *
> from numpy import *
> from numpy.random.mtrand import dirichlet
> from scipy.maxentropy import logsumexp
> import time
>
> # build an array of probability vectors.  each column represents a
> probability vector.
> num_vectors = 1000000
> log_prob_vectors = transpose(log(dirichlet([1, 1, 1], num_vectors)))
> # now renormalize each column, using logsumexp
> norm_prob_vectors = []
> t1 = time.time()
> for n in range(num_vectors):
>    norm_p = exp(log_prob_vectors[:, n] - logsumexp(log_prob_vectors[:, n]))
>    norm_prob_vectors.append(norm_p)
> t2 = time.time()
> norm_prob_vectors = array(norm_prob_vectors)
> print "logsumexp renormalization (%d many times) took %s seconds."
> %(num_vectors, str(t2-t1))
>
> i found that even with only 100,000 elements, this code takes about 5 seconds:
>
> logsumexp renormalization (100000 many times) took 5.07085394859 seconds.
>
> with 1 million elements, it becomes prohibitively slow:
>
> logsumexp renormalization (1000000 many times) took 70.7815010548 seconds.
>
> is there a way to speed this up? most vectorized operations that work
> on matrices in numpy/scipy are incredibly fast and it seems like a
> vectorized version of logsumexp should be near instant on this scale.
> is there a way to rewrite the above snippet so that it's faster?
>
> thanks very much for your help.

Here's logsumexp from scipy:

def logsumexp(a):
    a = asarray(a)
    a_max = a.max()
    return a_max + log((exp(a-a_max)).sum())

Would this work:

def logsumexp2(a):
    a = asarray(a)
    a_max = a.max(axis=0)
    return a_max + log((exp(a-a_max)).sum(axis=0))

?


More information about the NumPy-Discussion mailing list