[Numpy-tickets] [NumPy] #465: ndarray's mean method should be computed using double precision

NumPy numpy-tickets@scipy....
Wed Mar 7 11:04:26 CST 2007


#465: ndarray's mean method should be computed using double precision
------------------------+---------------------------------------------------
 Reporter:  chanley     |       Owner:  somebody
     Type:  defect      |      Status:  new     
 Priority:  normal      |   Milestone:          
Component:  numpy.core  |     Version:          
 Severity:  normal      |    Keywords:          
------------------------+---------------------------------------------------
 The default data type for the accumulator variable in the mean method
 should be double precision.  The problem can best be illustrated with the
 following example:

 {{{
 Python 2.4.3 (#2, Dec  7 2006, 11:01:45)
 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import numpy as n
 >>> n.__version__
 '1.0.2.dev3571'
 >>> a = n.ones((1000,1000),dtype=n.float32)*132.00005
 >>> print a
 [[ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  ...,
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]]
 >>> a.min()
 132.000045776
 >>> a.max()
 132.000045776
 >>> a.mean()
 133.96639999999999
 }}}

 Having the mean be greater than the maximum is a tad odd.

 The calculation of the mean is occurring with a single precision
 accumulator variable.  A user can force a double precision calculation
 with the following command and receive a correct result:

 {{{
 >>> a.mean(dtype=n.float64)
 132.00004577636719
 >>>
 }}}

 However, this is not going to be obvious to the casual user and will
 appear to be an error.

 I realize that one reason for not doing all calculations as double
 precision is performance.  However, it is probably better to always
 receive the correct answer than to quickly arrive at the wrong one.

 The current default behavior needs to be changed.  All calculations should
 be done in double precision.  If performance is needed the "expert user"
 can go back and start setting data types after having shown that their
 application arrives at a correct result.

 Not having to worry about overflow problems in the accumulator variable
 would also make numpy consistent with numarray's behavior.

-- 
Ticket URL: <http://projects.scipy.org/scipy/numpy/ticket/465>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.


More information about the Numpy-tickets mailing list