#465: ndarray's mean method should be computed using double precision
------------------------+---------------------------------------------------
Reporter: chanley | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone:
Component: numpy.core | Version:
Severity: normal | Keywords:
------------------------+---------------------------------------------------
The default data type for the accumulator variable in the mean method
should be double precision. The problem can best be illustrated with the
following example:
{{{
Python 2.4.3 (#2, Dec 7 2006, 11:01:45)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as n
>>> n.__version__
'1.0.2.dev3571'
>>> a = n.ones((1000,1000),dtype=n.float32)*132.00005
>>> print a
[[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
...,
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]
[ 132.00004578 132.00004578 132.00004578 ..., 132.00004578
132.00004578 132.00004578]]
>>> a.min()
132.000045776
>>> a.max()
132.000045776
>>> a.mean()
133.96639999999999
}}}
Having the mean be greater than the maximum is a tad odd.
The calculation of the mean is occurring with a single precision
accumulator variable. A user can force a double precision calculation
with the following command and receive a correct result:
{{{
>>> a.mean(dtype=n.float64)
132.00004577636719
>>>
}}}
However, this is not going to be obvious to the casual user and will
appear to be an error.
I realize that one reason for not doing all calculations as double
precision is performance. However, it is probably better to always
receive the correct answer than to quickly arrive at the wrong one.
The current default behavior needs to be changed. All calculations should
be done in double precision. If performance is needed the "expert user"
can go back and start setting data types after having shown that their
application arrives at a correct result.
Not having to worry about overflow problems in the accumulator variable
would also make numpy consistent with numarray's behavior.
--
Ticket URL: <http://projects.scipy.org/scipy/numpy/ticket/465>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.