[Numpy-discussion] Log Arrays
Thu May 8 11:08:32 CDT 2008
2008/5/8 David Cournapeau <firstname.lastname@example.org>:
> On Thu, May 8, 2008 at 10:20 PM, Charles R Harris
> <email@example.com> wrote:
> > Floating point numbers are essentially logs to base 2, i.e., integer
> > exponent and mantissa between 1 and 2. What does using the log buy you?
> Precision, of course. I am not sure I understand the notation base =
> 2, but doing computation in the so called log-domain is a must in many
> statistical computations. In particular, in machine learning with
> large datasets, it is common to have some points whose pdf is
> extremely small, and well below the precision of double. Typically,
> internally, the computation of my EM toolbox are done in the log
> domain, and use the logsumexp trick to compute likelihood given some
I'm not sure I'd describe this as precision, exactly; it's an issue of
numerical range. But yes, I've come across this while doing
maximum-likelihood fitting, and a coworker ran into it doing Bayesian
statistics. It definitely comes up.
Is "logarray" really the way to handle it, though? it seems like you
could probably get away with providing a logsum ufunc that did the
right thing. I mean, what operations does one want to do on logarrays?
add -> logsum
subtract -> ?
multiply -> add
mean -> logsum/N
median -> median
exponentiate to recover normal-space values -> exp
str -> ?
I suppose numerical integration is also valuable, so it would help to
have a numerical integrator that was reasonably smart about working
with logs. (Though really it's just a question of rescaling and
exponentiating, I think.)
A "logarray" type would help by keeping track of the fact that its
contents were in log space, and would make expressions a little less
cumbersome, I guess. How much effort would it take to write it so that
it got all the corner cases right?
More information about the Numpy-discussion