[Numpy-tickets] [NumPy] #924: problem with summing large array of float32
NumPy
numpy-tickets@scipy....
Fri Oct 3 12:11:57 CDT 2008
#924: problem with summing large array of float32
-------------------------------+--------------------------------------------
Reporter: emil | Owner: somebody
Type: defect | Status: reopened
Priority: high | Milestone:
Component: numpy.core | Version: 1.0.1
Severity: critical | Resolution:
Keywords: sum, dot, float32 |
-------------------------------+--------------------------------------------
Changes (by emil):
* status: closed => reopened
* resolution: wontfix =>
Comment:
Thanks for clarifying the issue, I should have realized that it was round-
off error
especially after I went to fortran.
But I still would ask that a change be made so that accumulators for sum,
dot, or
other similar functions by default be float64 (note that dot doesn't seem
to have the option
to change the type of the accumulator).
Here's why:
For medical imaging, we use large arrays of single-precision to save
space.
These large arrays are not sparse, and each entry has similar values
(for example x-ray attenuation coefficients will not vary greatly over a
volume).
There are processing operations used in computer-aided diagnosis that
involve
summing the array or dotting the array with another. The error introduced
by having a single-precision accumulator can be large as I found out. As a
user
of a high-level package such as matlab or numpy, one generally doesn't
expect
this kind of error with summing over values that do not alternate in sign.
I have checked with my colleagues on the floor, and nobody suspected this
problem.
Although they understand it, when I explain what is happening. Some are
wondering
now if they have an error in previous work.
The above example with the sum result being 110880003:
If the sum result is 110880003, and I get 110880000, I'm happy, because
the answer I got is reasonable for float32.
My above example is extreme. Summing over a float32 array (840,2200,60) of
ones
yields 16,777,216 instead of 110,880,000 . This answer is way off, so I
would probably
suspect something is wrong.
The problem, however, is that this type of error can lead to computational
errors
on the order of a few percent, which is inaccurate enough to cause
problems, but
not inaccurate enough for the problem to be easily detected.
In fact, the situation were I caught the problem, the sum result was only
off by 4% .
--
Ticket URL: <http://scipy.org/scipy/numpy/ticket/924#comment:2>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.
More information about the Numpy-tickets
mailing list