[Numpy-discussion] large float32 array issue

Vincent Schut schut@sarvision...
Thu Nov 4 04:56:23 CDT 2010

On 11/03/2010 03:04 PM, Bruce Southey wrote:
> On 11/03/2010 06:52 AM, Pauli Virtanen wrote:
>> Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote:
>> [clip]
>>> Btw, should I file a bug on this?
>> One can argue that mean() and sum() should use a numerically stabler
>> algorithm, so yes, a bug can be filed if there is not yet one already.
> This is a 'user bug' not a numpy bug because it is a well known
> numerical problem. I recall that we have had this type of discussion
> before that has resulted in these functions being left as they are. The
> numerical problem is mentioned better in the np.mean docstring than the
> np.sum docstring.
> My understanding was that any new algorithm has to be better than the
> current algorithm especially in speed and accuracy across 'typical'
> numpy problems across the different Python and OS versions not just for
> numerically challenged cases. For example, I would not want to sacrifice
> speed if I achieve the same accuracy without losing as much speed as
> just changing the dtype to float128 (as I use x86_64 Linux).
> Also in Warren's mean example, this is simply a 32-bit error because it
> disappears when using 64-bit (numpy's default) - well, until we reach
> the extreme 64-bit values.
>   >>>  np.ones((11334,16002)).mean()
> 1.0
>   >>>  np.ones((11334,16002),np.float32).mean()
> 0.092504406598019437
>   >>>  np.ones((11334,16002),np.float32).mean().dtype
> dtype('float64')
> Note that there is probably a bug in np.mean because a 64-bit dtype is
> returned for integers and 32-bit or lower precision floats. So upcast is
> not apparently being done on the accumulator.
> Bruce

Thanks for the info, all. I agree that this is a 'user bug', however, 
mentioning this as a corner case someplace a user would look when 
finding errors like these might be an idea, as I have the feeling it 
will keep turning up once and again at this list otherwise. Maybe start 
a webpage 'numpy and some often encountered floating point issues'?

For now, I've just ordered more memory. The cheapest and simplest 
solution, if you ask me :-)


More information about the NumPy-Discussion mailing list