[Numpy-discussion] Log Arrays
Thu May 8 11:42:50 CDT 2008
On Fri, May 9, 2008 at 1:04 AM, Charles R Harris
> < 1e-308 ?
Yes, all the time. I mean, if it was not, why people would bother with
long double and co ? Why denormal would exist ? I don't consider the
comparison with the number of particules to be really relevant here.
We are talking about implementation problems.
> Yes, logs can be useful there, but I still fail to see any precision
> advantage. As I say, to all intents and purposes, IEEE floating point *is* a
> logarithm. You will see that if you look at how log is implemented in
> hardware. I'm less sure of the C floating point library because it needs to
> be portable.
>> a = np.array([-1000., -1001.])
>> np.log(np.sum(np.exp(a))) -> -inf
>> -1000 + np.log(np.sum([1 + np.exp(-1)])) -> correct result
> What realistic probability is in the range exp(-1000) ?
Realistic as significant, none of course. Realistic as it happens in
computation ? Certainly. Typically, when you do clustering, and you
have distant clusters, when you compute the probabilities of the
points from one cluster relatively to another one, you can quickly get
several units from the mean. Adds a small variance, and you can
quickly get (x-mu)**2/sigma**2 around 1000.
You cannot just clip to 0, specially in online settings.
> If you have a hammer... It's portable, but there are wasted cpu cycles in
> there. If speed was important, I suspect you could do better writing a low
> level function that assumed IEEE doubles and twiddled the bits.
When you call a function in python, you waste thousand cycles at every
call. Yet, you use python, and not assembly :) The above procedure is
extremely standard, and used in all robust implementations of machine
learning algorithms I am aware of, it is implemented in HTK, a widely
used toolkit for HMM for speech recognition for example.
Twiddling bits is all fun, but it takes time and is extremely error
prone. Also, I don't see what kind of method you have in mind here,
exactly: how would you do a logsumexp algorithm with bit twiddling ?
More information about the Numpy-discussion