[Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

Olivier Delalleau shish@keba...
Fri Jan 4 08:03:02 CST 2013


2013/1/3 Andrew Collette <andrew.collette@gmail.com>:
>> Another solution is to forget about trying to be smart and always
>> upcast the operation. That would be my 2nd preferred solution, but it
>> would make it very annoying to deal with Python scalars (typically
>> int64 / float64) that would be upcasting lots of things, potentially
>> breaking a significant amount of existing code.
>>
>> So, personally, I don't see a straightforward solution without
>> warning/error, that would be safe enough for programmers.
>
> I guess what's really confusing me here is that I had assumed that this:
>
> result = myarray + scalar
>
> was equivalent to this:
>
> result = myarray + numpy.array(scalar)
>
> where the dtype of the converted scalar was chosen to be "just big
> enough" for it to fit.  Then you proceed using the normal rules for
> array addition.  Yes, you can have upcasting or rollover depending on
> the values involved, but you have that anyway with array addition;
> it's just how arrays work in NumPy.

A key difference is that with arrays, the dtype is not chosen "just
big enough" for your data to fit. Either you set the dtype yourself,
or you're using the default inferred dtype (int/float). In both cases
you should know what to expect, and it doesn't depend on the actual
numeric values (except for the auto int/float distinction).

>
> Also, have I got this (proposed behavior) right?
>
> array([127], dtype=int8) + 128 -> ValueError
> array([127], dtype=int8) + 127 -> -2
>
> It seems like all this does is raise an error when the current rules
> would require upcasting, but still allows rollover for smaller values.
>  What error condition, specifically, is the ValueError designed to
> tell me about?   You can still get "unexpected" data (if you're not
> expecting rollover) with no exception.

The ValueError is here to warn you that the operation may not be doing
what you want. The rollover for smaller values would be the documented
(and thus hopefully expected) behavior.

Taking the addition as an example may be misleading, as it makes it
look like we could just "always rollover" to obtain consistent
behavior, and programmers are to some extent used to integer rollover
on this kind of operation. However, I gave examples with "maximum"
that I believe show it's not that easy (this behavior would just
appear "wrong"). Another example is with the integer division, where
casting the scalar silently would result in
    array([-128], dtype=int8) // 128 -> [1]
which is unlikely to be something someone would like to obtain.

To summarize the goals of the proposal (in my mind):
1. Low cognitive load (simple and consistent across ufuncs).
2. Low risk of doing something unexpected.
3. Efficient by default.
4. Most existing (non buggy) code should not be affected.

If we always do the silent cast, it will significantly break existing
code relying on the 1.6 behavior, and increases the risk of doing
something unexpected (bad on #2 & #4)
If we always upcast, we may break existing code and lose efficiency
(bad on #3 and #4).
If we keep current behavior, we stay with something that's difficult
to understand and has high risk of doing weird things (bad on #1 and
#2).

-=- Olivier


More information about the NumPy-Discussion mailing list