[Numpy-discussion] Log Arrays

Charles R Harris charlesr.harris@gmail....
Thu May 8 21:22:43 CDT 2008


On Thu, May 8, 2008 at 2:37 PM, Warren Focke <focke@slac.stanford.edu>
wrote:

>
>
> On Thu, 8 May 2008, Charles R Harris wrote:
>
> > On Thu, May 8, 2008 at 11:46 AM, Anne Archibald <
> peridot.faceted@gmail.com>
> > wrote:
> >
> >> 2008/5/8 Charles R Harris <charlesr.harris@gmail.com>:
> >>>
> >>> On Thu, May 8, 2008 at 10:56 AM, Robert Kern <robert.kern@gmail.com>
> >> wrote:
> >>>>
> >>>> When you're running an optimizer over a PDF, you will be stuck in the
> >>>> region of exp(-1000) for a substantial amount of time before you get
> >>>> to the peak. If you don't use the log representation, you will never
> >>>> get to the peak because all of the gradient information is lost to
> >>>> floating point error. You can consult any book on computational
> >>>> statistics for many more examples. This is a long-established best
> >>>> practice in statistics.
> >>>
> >>> But IEEE is already a log representation. You aren't gaining precision,
> >> you
> >>> are gaining more bits in the exponent at the expense of fewer bits in
> the
> >>> mantissa, i.e., less precision. As I say, it may be convenient, but if
> >> cpu
> >>> cycles matter, it isn't efficient.
> >>
> >> Efficiency is not the point here. IEEE floats simply cannot represent
> >> the difference between exp(-1000) and exp(-1001). This difference can
> >> matter in many contexts.
> >>
> >> For example, suppose I have observed a source in X-rays and I want to
> >> fit a blackbody spectrum. I have, say, hundreds of spectral bins, with
> >> a number of photons in each. For any temperature T and amplitude A, I
> >> can compute the distribution of photons in each bin (Poisson with mean
> >> depending on T and A), and obtain the probability of obtaining the
> >> observed number of photons in each bin. Even if a blackbody with
> >> temperature T and amplitude A is a perfect fit I should expect this
> >
> >
> > This is an interesting problem, partly because I was one of the first
> guys
> > to use synthetic spectra (NO, OH), to fit temperatures to spectral data.
> But
> > also because it might look like the Poisson matters. But probably not,
> the
> > curve fitting effectively aggregates data and the central limit theorem
> > kicks in, so that the normal least squares approach will probably work.
> > Also, if the number of photons in a bin is much more that about 10, the
> > computation of the Poisson distribution probably uses a Gaussian, with
> > perhaps a small adjustment for the tail. So I'm curious, did you find any
> > significant difference trying both methods?
>
> I can't address Anne's results, but I've certainly found it to make a
> difference
> in my work, and it's pretty standard in high-energy astronomy.  The central
> limit theorem does not get you out of having to use the right PDF to
> compare
> data to model.  It does mean that, if you have enough events, the PDF of
> the
> fitness function is fairly chi-squarish, so much of the logic developed for
> least-squares fitting still applies (like for finding confidence intervals
> on
> the fitted parameters).  Using Poisson statistics instead of a Gaussian
> approximation is actually pretty easy, see Cash (Astrophysical Journal,
> Part 1,
> vol. 228, Mar. 15, 1979, p. 939-947
> http://adsabs.harvard.edu/abs/1979ApJ...228..939C)
>

Interesting paper.  I've also been playing with a 64 bit floating format
with a 31 bit offset binary exponent and 33 bit mantissa with a hidden bit,
which can hold the probability of any give sequence of 2**30 - 1 coin
tosses, or about 1e-323228496. It looks to be pretty efficient for
multiplication and compare, and functions like log and exp aren't hard to
do.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080508/53546cb0/attachment-0001.html 


More information about the Numpy-discussion mailing list