[Numpy-discussion] large float32 array issue
Vincent Schut
schut@sarvision...
Thu Nov 4 04:56:23 CDT 2010
On 11/03/2010 03:04 PM, Bruce Southey wrote:
> On 11/03/2010 06:52 AM, Pauli Virtanen wrote:
>> Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote:
>> [clip]
>>> Btw, should I file a bug on this?
>> One can argue that mean() and sum() should use a numerically stabler
>> algorithm, so yes, a bug can be filed if there is not yet one already.
>>
> This is a 'user bug' not a numpy bug because it is a well known
> numerical problem. I recall that we have had this type of discussion
> before that has resulted in these functions being left as they are. The
> numerical problem is mentioned better in the np.mean docstring than the
> np.sum docstring.
>
> My understanding was that any new algorithm has to be better than the
> current algorithm especially in speed and accuracy across 'typical'
> numpy problems across the different Python and OS versions not just for
> numerically challenged cases. For example, I would not want to sacrifice
> speed if I achieve the same accuracy without losing as much speed as
> just changing the dtype to float128 (as I use x86_64 Linux).
>
> Also in Warren's mean example, this is simply a 32-bit error because it
> disappears when using 64-bit (numpy's default) - well, until we reach
> the extreme 64-bit values.
>
> >>> np.ones((11334,16002)).mean()
> 1.0
> >>> np.ones((11334,16002),np.float32).mean()
> 0.092504406598019437
> >>> np.ones((11334,16002),np.float32).mean().dtype
> dtype('float64')
>
> Note that there is probably a bug in np.mean because a 64-bit dtype is
> returned for integers and 32-bit or lower precision floats. So upcast is
> not apparently being done on the accumulator.
>
>
> Bruce
Thanks for the info, all. I agree that this is a 'user bug', however,
mentioning this as a corner case someplace a user would look when
finding errors like these might be an idea, as I have the feeling it
will keep turning up once and again at this list otherwise. Maybe start
a webpage 'numpy and some often encountered floating point issues'?
For now, I've just ordered more memory. The cheapest and simplest
solution, if you ask me :-)
Vincent.
More information about the NumPy-Discussion
mailing list