[Numpy-discussion] Log Arrays

Charles R Harris charlesr.harris@gmail....
Thu May 8 11:54:45 CDT 2008


On Thu, May 8, 2008 at 10:42 AM, David Cournapeau <cournape@gmail.com>
wrote:

> On Fri, May 9, 2008 at 1:04 AM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
>
> > <  1e-308 ?
>
> Yes, all the time. I mean, if it was not, why people would bother with
> long double and co ? Why denormal would exist ? I don't consider the
> comparison with the number of particules to be really relevant here.
> We are talking about implementation problems.
>
> >
> > Yes, logs can be useful there, but I still fail to see any precision
> > advantage. As I say, to all intents and purposes, IEEE floating point
> *is* a
> > logarithm. You will see that if you look at how log is implemented in
> > hardware. I'm less sure of the C floating point library because it needs
> to
> > be portable.
> >
> >>
> >> a = np.array([-1000., -1001.])
> >> np.log(np.sum(np.exp(a))) -> -inf
> >> -1000 + np.log(np.sum([1 + np.exp(-1)])) -> correct result
> >
> > What realistic probability is in the range exp(-1000) ?
>
> Realistic as significant, none of course. Realistic as it happens in
> computation ? Certainly. Typically, when you do clustering, and you
> have distant clusters, when you compute the probabilities of the
> points from one cluster relatively to another one, you can quickly get
> several units from the mean. Adds a small variance, and you can
> quickly get (x-mu)**2/sigma**2 around 1000.
>

Yes, and Gaussians are a delusion beyond a few sigma. One of my pet peeves.
If you have more than 8 standard deviations, then something is fundamentally
wrong in the concept and formulation. It is more likely that a particle
sized blackhole has whacked out some component of the experiment.


> You cannot just clip to 0, specially in online settings.
>
> >
> > If you have a hammer... It's portable, but there are wasted cpu cycles in
> > there. If speed was important, I suspect you could do better writing a
> low
> > level function that assumed IEEE doubles and twiddled the bits.
>
> When you call a function in python, you waste thousand cycles at every
> call. Yet, you use python, and not assembly :) The above procedure is
> extremely standard, and used in all robust implementations of machine
> learning algorithms I am aware of, it is implemented in HTK, a widely
> used toolkit for HMM for speech recognition for example.
>
> Twiddling bits is all fun, but it takes time and is extremely error
> prone. Also, I don't see what kind of method you have in mind here,
> exactly: how would you do a logsumexp algorithm with bit twiddling ?
>

You are complaining of inadequate range, but that is what scale factors are
for. Why compute exponentials and logs when all you need to do is store an
exponent in an integer.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080508/aeb9102e/attachment.html 


More information about the Numpy-discussion mailing list