[Numpy-discussion] large float32 array issue

Bruce Southey bsouthey@gmail....
Wed Nov 3 09:04:18 CDT 2010


On 11/03/2010 06:52 AM, Pauli Virtanen wrote:
> Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote:
> [clip]
>> Btw, should I file a bug on this?
> One can argue that mean() and sum() should use a numerically stabler
> algorithm, so yes, a bug can be filed if there is not yet one already.
>
This is a 'user bug' not a numpy bug because it is a well known 
numerical problem. I recall that we have had this type of discussion 
before that has resulted in these functions being left as they are. The 
numerical problem is mentioned better in the np.mean docstring than the 
np.sum docstring.

My understanding was that any new algorithm has to be better than the 
current algorithm especially in speed and accuracy across 'typical' 
numpy problems across the different Python and OS versions not just for 
numerically challenged cases. For example, I would not want to sacrifice 
speed if I achieve the same accuracy without losing as much speed as 
just changing the dtype to float128 (as I use x86_64 Linux).

Also in Warren's mean example, this is simply a 32-bit error because it 
disappears when using 64-bit (numpy's default) - well, until we reach 
the extreme 64-bit values.

 >>> np.ones((11334,16002)).mean()
1.0
 >>> np.ones((11334,16002),np.float32).mean()
0.092504406598019437
 >>> np.ones((11334,16002),np.float32).mean().dtype
dtype('float64')

Note that there is probably a bug in np.mean because a 64-bit dtype is 
returned for integers and 32-bit or lower precision floats. So upcast is 
not apparently being done on the accumulator.


Bruce




More information about the NumPy-Discussion mailing list