[Numpy-discussion] Weird upcast behavior with 1.6.x, working as intended?

Olivier Delalleau shish@keba...
Fri Sep 30 22:58:35 CDT 2011


2011/9/30 Mark Wiebe <mwwiebe@gmail.com>

> On Fri, Sep 23, 2011 at 1:52 PM, Olivier Delalleau <shish@keba.be> wrote:
>
>> NB: I opened a ticket (http://projects.scipy.org/numpy/ticket/1949) about
>> this, in case it would help getting some attention on this issue.
>>
>
> A lot of what you're seeing here is due to changes I did for 1.6. I
> generally made the casting mechanism symmetric (before it could give
> different types depending on the order of the input arguments), and added a
> little bit of value-based casting for scalars to reduce some of the overflow
> that could happen. Before, it always downcast to the smallest-size type
> regardless of the value in the scalar.
>
>
>> Besides this, I've been experimenting with the cast mechanisms of mixed
>> scalar / array operations in numpy 1.6.1 on a Linux x86_64 architecture, and
>> I can't make sense out of the current behavior. Here are some experiments
>> adding a two-element array to a scalar (both of integer types):
>>
>> (1) [0 0] (int8) + 0 (int32) -> [0 0] (int8)
>> (2) [0 0] (int8) + 127 (int32) -> [127 127] (int16)
>> (3) [0 0] (int8) + -128 (int32) -> [-128 -128] (int8)
>> (4) [0 0] (int8) + 2147483647 (int32) -> [2147483647 2147483647] (int32)
>> (5) [1 1] (int8) + 127 (int32) -> [128 128] (int16)
>> (6) [1 1] (int8) + 2147483647 (int32) -> [-2147483648 -2147483648]
>> (int32)
>> (7) [127 127] (int8) + 1 (int32) -> [-128 -128] (int8)
>> (8) [127 127] (int8) + 127 (int32) -> [254 254] (int16)
>>
>> Here are some examples of things that confuse me:
>> - Output dtype in (2) is int16 while in (3) it is int8, although both
>> results can be written as int8
>>
>
> Here would be the cause of it:
>
>
> https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/convert_datatype.c#L1098
>
> It should be a <= instead of a <, to include the value 127.
>
>
>> - Adding a number that would cause an overflow causes the output dtype to
>> be upgraded to a dtype that can hold the result in (5), but not in (6)
>>
>
> Actually, it's upgraded because of the previous point, not because of the
> overflow. With the change to <= above, this would produce int8
>
>
>> - Adding a small int32 in (7) that causes an overflow makes it keep the
>> base int8 dtype, but a bigger int32 (although still representable as an
>> int8) in (8) makes it switch to int16 (if someone wonders, adding 126
>> instead of 127 in (8) would result in [-3 -3] (int8), so 127 is special for
>> some reason).
>>
>> My feeling is actually that the logic is to try to downcast the scalar as
>> much as possible without changing its value, but with a bug that 127 is not
>> downcasted to int8, and remains int16 (!).
>>
>> Some more behavior that puzzles me, this time comparing + vs -:
>> (9) [0 0] (uint32) + -1 (int32) -> [-1 -1] (int64)
>> (10) [0 0] (uint32) - 1 (int32) -> [4294967295 4294967295] (uint32)
>>
>> Here I would expect that adding -1 would be the same as subtracting 1, but
>> that is not the case.
>>
>
> In the second case, it's equivalent to np.subtract(np.array([0, 0],
> np.uint32), np.int32(1)). The scalar 1 fits into the uint32, so the result
> type of the subtraction is uint32. In the first case, the scalar -1 does not
> fit into the uint32, so it is upgraded to int64.
>
>
>>
>> Is there anyone with intimate knowledge of the numpy casting behavior for
>> mixed scalar / array operations who could explain what are the rules
>> governing it?
>>
>
> Hopefully my explanations help a bit. I think this situation is less than
> ideal, and it would be better to do something more automatic, like doing an
> up-conversion on overflow. This would more closely emulate Python's behavior
> of integers never overflowing, at least until 64 bits. This kind of change
> would be a fair bit of work, and would likely reduce the performance of
> NumPy slightly.
>
> Cheers,
> Mark
>
>
Thanks! It's re-assuring to hear that part of it is caused by a bug, and the
other part has some logic behind it (even though it leads to surprising
results). I appreciate you taking the time to clear it up for me :)

-=- Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110930/6a744f39/attachment.html 


More information about the NumPy-Discussion mailing list