[Numpy-discussion] Bug in logaddexp2.reduce

Anne Archibald peridot.faceted@gmail....
Thu Apr 1 01:56:37 CDT 2010


On 1 April 2010 02:49, David Cournapeau <david@silveregg.co.jp> wrote:
> Charles R Harris wrote:
>> On Thu, Apr 1, 2010 at 12:04 AM, Anne Archibald
>> <peridot.faceted@gmail.com>wrote:
>>
>>> On 1 April 2010 01:59, Charles R Harris <charlesr.harris@gmail.com> wrote:
>>>>
>>>> On Wed, Mar 31, 2010 at 11:46 PM, Anne Archibald <
>>> peridot.faceted@gmail.com>
>>>> wrote:
>>>>> On 1 April 2010 01:40, Charles R Harris <charlesr.harris@gmail.com>
>>> wrote:
>>>>>>
>>>>>> On Wed, Mar 31, 2010 at 11:25 PM, <josef.pktd@gmail.com> wrote:
>>>>>>> On Thu, Apr 1, 2010 at 1:22 AM,  <josef.pktd@gmail.com> wrote:
>>>>>>>> On Thu, Apr 1, 2010 at 1:17 AM, Charles R Harris
>>>>>>>> <charlesr.harris@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Mar 31, 2010 at 6:08 PM, <josef.pktd@gmail.com> wrote:
>>>>>>>>>> On Wed, Mar 31, 2010 at 7:37 PM, Warren Weckesser
>>>>>>>>>> <warren.weckesser@enthought.com> wrote:
>>>>>>>>>>> T J wrote:
>>>>>>>>>>>> On Wed, Mar 31, 2010 at 1:21 PM, Charles R Harris
>>>>>>>>>>>> <charlesr.harris@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like roundoff error.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> So this is "expected" behavior?
>>>>>>>>>>>>
>>>>>>>>>>>> In [1]: np.logaddexp2(-1.5849625007211563,
>>> -53.584962500721154)
>>>>>>>>>>>> Out[1]: -1.5849625007211561
>>>>>>>>>>>>
>>>>>>>>>>>> In [2]: np.logaddexp2(-0.5849625007211563,
>>> -53.584962500721154)
>>>>>>>>>>>> Out[2]: nan
>>>>>>>>>>>>
>>>>>>>>>>> Is any able to reproduce this?  I don't get 'nan' in either
>>> 1.4.0
>>>>>>>>>>> or
>>>>>>>>>>> 2.0.0.dev8313 (32 bit Mac OSX).  In an earlier email T J
>>> reported
>>>>>>>>>>> using
>>>>>>>>>>> 1.5.0.dev8106.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> np.logaddexp2(-0.5849625007211563, -53.584962500721154)
>>>>>>>>>> nan
>>>>>>>>>>>>> np.logaddexp2(-1.5849625007211563, -53.584962500721154)
>>>>>>>>>> -1.5849625007211561
>>>>>>>>>>
>>>>>>>>>>>>> np.version.version
>>>>>>>>>> '1.4.0'
>>>>>>>>>>
>>>>>>>>>> WindowsXP 32
>>>>>>>>>>
>>>>>>>>> What compiler? Mingw?
>>>>>>>> yes, mingw 3.4.5. , official binaries release 1.4.0 by David
>>>>>>> sse2 Pentium M
>>>>>>>
>>>>>> Can you try the exp2/log2 functions with the problem data and see if
>>>>>> something goes wrong?
>>>>> Works fine for me.
>>>>>
>>>>> If it helps clarify things, the difference between the two problem
>>>>> input values is exactly 53 (and that's what logaddexp2 does an exp2
>>>>> of); so I can provide a simpler example:
>>>>>
>>>>> In [23]: np.logaddexp2(0, -53)
>>>>> Out[23]: nan
>>>>>
>>>>> Of course, for me it fails in both orders.
>>>>>
>>>> Ah, that's progress then ;) The effective number of bits in a double is
>>> 53
>>>> (52 + implicit bit). That shouldn't cause problems but it sure looks
>>>> suspicious.
>>> Indeed, that's what led me to the totally wrong suspicion that
>>> denormals have something to do with the problem. More data points:
>>>
>>> In [38]: np.logaddexp2(63.999, 0)
>>> Out[38]: nan
>>>
>>> In [39]: np.logaddexp2(64, 0)
>>> Out[39]: 64.0
>>>
>>> In [42]: np.logaddexp2(52.999, 0)
>>> Out[42]: 52.999000000000002
>>>
>>> In [43]: np.logaddexp2(53, 0)
>>> Out[43]: nan
>>>
>>> It looks to me like perhaps the NaNs are appearing when the smaller
>>> term affects only the "extra" bits provided by the FPU's internal
>>> larger-than-double representation. Some such issue would explain why
>>> the problem seems to be hardware- and compiler-dependent.
>>>
>>>
>> Hmm, that 63.999 is kinda strange. Here is a bit of code that might confuse
>> a compiler working with different size mantissas:
>>
>> @type@ npy_log2_1p@c@(@type@ x)
>> {
>>     @type@ u = 1 + x;
>>     if (u == 1) {
>>         return LOG2E*x;
>>     } else {
>>         return npy_log2@c@(u) * x / (u - 1);
>>     }
>> }
>>
>> It might be that u != 1 does not imply u-1 != 0.
>
> I don't think that's true for floating point, since the spacing at 1 and
> at 0 are quite different, and that's what defines whether two floating
> point numbers are close or not.
>
> But I think you're right about the problem. I played a bit on my netbook
> where I have the same problem (Ubuntu 10.04 beta, gc 4.4.3, 32 bits).
> The problem is in npy_log2_1p (for example, try npy_log2_1p(1e-18) vs
> npy_log2_p1(1e-15)). If I compile with -ffloat-store, the problem also
> disappears.
>
> I think the fix is to do :
>
> @type@ npy_log2_1p@c@(@type@ x)
> {
>      @type@ u = 1 + x;
>      if ((u-1) == 0) {
>          return LOG2E*x;
>      } else {
>          return npy_log2@c@(u) * x / (u - 1);
>      }
> }

Particularly given the comments in the boost source code, I'm leery of
this fix; who knows what an optimizing compiler will do with it? What
if, for example, u-1 and u get treated differently - one gets
truncated to double, but the other doesn't, for example? Now the
"correction" is completely bogus. Or, of course, the compiler might
rewrite the expressions. At the least an explicit "volatile" variable
or two is necessary, and even so making it compiler-proof seems like a
real challenge. At the least it's letting us in for all sorts of
subtle trouble down the road (though at least unit tests can catch
it).

Since this is a subtle issue, I vote for delegating it (and log10_1p)
to log1p. If *that* has trouble, well, at least it'll only be on
non-C99 machines, and we can try compiler gymnastics.

> this works for a separate C program, but for some reason, once I try
> this in numpy, it does not work, and I have no battery anymore,

Clearly the optimizing compiler is inserting the DRDB (drain David's
battery) opcode.


Anne


More information about the NumPy-Discussion mailing list