[Numpy-discussion] Floating Point Difference between numpy and numarray
Tue Sep 9 02:25:17 CDT 2008
On Tue, 2008-09-09 at 07:53 +0100, Hanni Ali wrote:
> Hi David,
> Forgot to answer last week, I was under a fair bit of pressure time
> wise, but thanks for your input. I sorted it all in the end and just
> in time, but the main issue here was the change from numarray to
> numpy. Previously where a typecode of 'f' was used in numarray, the
> calculation was performed in double precision whereas in numpy it was
> calculated in single precision. Hence when migrating the code, the
> differences popped up, which were fairly big when considering the size
> and number of mean calcs we perform.
glad it worked ok for you.
> I now have a distinct dislike of float values (it'll probably wear off
> over time), how can the sum of 100,000 numbers be anything other than
> the sum of those numbers. I know the reasoning, as highlighted by the
> couple of other e-mails we have had, but I feel the default should
> probably lean towards accuracy than speed. 2.0+2.0=4.0 and 2.0
> +2.0.....=200,000.0 not 2array.sum() != 200,000...
I think it is a fallacy to say you prefer accuracy over speed: the
fallacy is in thinking it is binary choice. You care about speed,
because otherwise, you would not use a computer at all, you would do
everything by hand . Floating point is by itself an approximation: it
can not even represent rational number accurately, let alone algebraic
numbers or transcendent ones ! There are packages to do exact
computation (look at sage for example for something based on python),
but numpy/scipy are first numerical computation, meaning approximation
along the way.
It is true that it can give some unexpected results, and you should be
aware of floating point limitations . That being said, for a lot of
computations, when you have unexpected difference between float and
double, you have a problem in your implementation. For example, IIRC,
you computed average of a big number numbers, at once: you can get
better results if you first normalize your numbers. Another example
which bites me all the time in statistic is when computing exponential
of small numbers: log(exp(-1000)) will be -Inf done naively, but you and
me know the solution is of course -1000; again, you should think more
about your computation.
IOW, floating point are a useful approximation/abstraction (I don't know
if you are familiar with fixed point computation, as done in some DSP,
but it is not pretty), but it breaks in some cases.
 I know some people do this for some kind of computation; in a
different context from numerical computation, I found the following
interview from Alain Connes (one of the most famous French Mathematician
currently alive), to be extemely enlightening:
http://www.ipm.ac.ir/IPM/news/connes-interview.pdf (see page 2-3 for the
discussion about computer and computation)
 "What every computer scientist should know about floating-point
arithmetic", in ACM Computer Survey, 1991, By David Goldberg.
More information about the Numpy-discussion