[Numpy-discussion] Log Arrays

Anne Archibald peridot.faceted@gmail....
Thu May 8 11:08:32 CDT 2008


2008/5/8 David Cournapeau <cournape@gmail.com>:
> On Thu, May 8, 2008 at 10:20 PM, Charles R Harris
>  <charlesr.harris@gmail.com> wrote:
>  >
>  >
>  > Floating point numbers are essentially logs to base 2, i.e., integer
>  > exponent and mantissa between 1 and 2. What does using the log buy you?
>
>  Precision, of course. I am not sure I understand the notation base =
>  2, but doing computation in the so called log-domain is a must in many
>  statistical computations. In particular, in machine learning with
>  large datasets, it is common to have some points whose pdf is
>  extremely small, and well below the precision of double. Typically,
>  internally, the computation of my EM toolbox are done in the log
>  domain, and use the logsumexp trick to compute likelihood given some
>  data:

I'm not sure I'd describe this as precision, exactly; it's an issue of
numerical range. But yes, I've come across this while doing
maximum-likelihood fitting, and a coworker ran into it doing Bayesian
statistics. It definitely comes up.

Is "logarray" really the way to handle it, though? it seems like you
could probably get away with providing a logsum ufunc that did the
right thing. I mean, what operations does one want to do on logarrays?

add -> logsum
subtract -> ?
multiply -> add
mean -> logsum/N
median -> median
exponentiate to recover normal-space values -> exp
str -> ?

I suppose numerical integration is also valuable, so it would help to
have a numerical integrator that was reasonably smart about working
with logs. (Though really it's just a question of rescaling and
exponentiating, I think.)

A "logarray" type would help by keeping track of the fact that its
contents were in log space, and would make expressions a little less
cumbersome, I guess. How much effort would it take to write it so that
it got all the corner cases right?

Anne


More information about the Numpy-discussion mailing list