[Numpy-discussion] large float32 array issue
Bruce Southey
bsouthey@gmail....
Wed Nov 3 09:04:18 CDT 2010
On 11/03/2010 06:52 AM, Pauli Virtanen wrote:
> Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote:
> [clip]
>> Btw, should I file a bug on this?
> One can argue that mean() and sum() should use a numerically stabler
> algorithm, so yes, a bug can be filed if there is not yet one already.
>
This is a 'user bug' not a numpy bug because it is a well known
numerical problem. I recall that we have had this type of discussion
before that has resulted in these functions being left as they are. The
numerical problem is mentioned better in the np.mean docstring than the
np.sum docstring.
My understanding was that any new algorithm has to be better than the
current algorithm especially in speed and accuracy across 'typical'
numpy problems across the different Python and OS versions not just for
numerically challenged cases. For example, I would not want to sacrifice
speed if I achieve the same accuracy without losing as much speed as
just changing the dtype to float128 (as I use x86_64 Linux).
Also in Warren's mean example, this is simply a 32-bit error because it
disappears when using 64-bit (numpy's default) - well, until we reach
the extreme 64-bit values.
>>> np.ones((11334,16002)).mean()
1.0
>>> np.ones((11334,16002),np.float32).mean()
0.092504406598019437
>>> np.ones((11334,16002),np.float32).mean().dtype
dtype('float64')
Note that there is probably a bug in np.mean because a 64-bit dtype is
returned for integers and 32-bit or lower precision floats. So upcast is
not apparently being done on the accumulator.
Bruce
More information about the NumPy-Discussion
mailing list