[Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

Olivier Delalleau shish@keba...
Thu Jan 3 20:39:05 CST 2013


2013/1/3 Andrew Collette <andrew.collette@gmail.com>:
> Hi Dag,
>
>> If neither is objectively better, I think that is a very good reason to
>> kick it down to the user. "Explicit is better than implicit".
>
> I agree with you, up to a point.  However, we are talking about an
> extremely common operation that I think most people (myself included)
> would not expect to raise an exception: namely, adding a number to an
> array.
>
>> It's a good solution to encourage bug-free code. It may not be a good
>> solution to avoid typing.
>
> Ha!  But seriously, checking every time I make an addition?  And in
> the current version of numpy it's not buggy code to add 128 to an int8
> array; it's documented to give you an int16 with the result of the
> addition.  Maybe it shouldn't, but that's what it does.
>
>> I think you usually have a bug in your program when this happens, since
>> either the dtype is wrong, or the value one is trying to store is wrong.
>> I know that's true for myself, though I don't claim to know everybody
>> elses usecases.
>
> I don't think it's unreasonable to add a number to an int16 array (or
> int32), and rely on specific, documented behavior if the number is
> outside the range.  For example, IDL will clip the value.  Up until
> 1.6, in NumPy it would roll over. Currently it upcasts.
>
> I won't make the case for upcasting vs rollover again, as I think
> that's dealt with extensively in the threads linked in the bug.  I am
> concerned about the tests I need to add wherever I might have a
> scalar, or the program blows up.
>
> It occurs to me that, if I have "a = b + c" in my code, and "c" is
> sometimes a scalar and sometimes an array, I will get different
> behavior.  If I have this right, if "c" is an array of larger dtype,
> including a 1-element array, it will upcast, if it's the same dtype,
> it will roll over regardless, but if it's a scalar and the result
> won't fit, it will raise ValueError.
>
> By the way, how do I test for this?  I can't test just the scalar
> because the proposed behavior (as I understand it) considers the
> result of the addition.  Should I always compute amax (nanmax)? Do I
> need to try adding them and look for ValueError?
>
> And things like this suddenly become dangerous:
>
> try:
>     some_function(myarray + something)
> except ValueError:
>    print "Problem in some_function!"

Actually, the proposed behavior considers only the value of the
scalar, not the result of the addition.
So the correct way to do things with this proposal would be to be sure
you don't add to an array a scalar value that can't fit in the array's
dtype.

In 1.6.1, you should make this check anyway, since otherwise your
computation can be doing something completely different without
telling you (and I doubt it's what you'd want):
    In [50]: np.array([2], dtype='int8') + 127
    Out[50]: array([-127], dtype=int8)
    In [51]: np.array([2], dtype='int8') + 128
    Out[51]: array([130], dtype=int16)

If the decision is to always roll-over, the first thing to decide is
whether this means the scalar is downcasted, or the output of the
computation. It doesn't matter for +, but for instance for the
"maximum" ufunc, I don't think it makes sense to perform the
computation at higher precision then downcast the output, as you would
otherwise have:
    np.maximum(np.ones(1, dtype='int8'), 128)) == [-128]
So out of consistency (across ufuncs) I think it should always
downcast the scalar (it has the advantage of being more efficient too,
since you don't need to do an upcast to perform the computation). But
then you're up for some nasty surprise if your scalar overflows and
you didn't expect it. For instance the "maximum" example above would
return [1], which may be expected... or not (maybe you wanted to
obtain [128] instead?).

Another solution is to forget about trying to be smart and always
upcast the operation. That would be my 2nd preferred solution, but it
would make it very annoying to deal with Python scalars (typically
int64 / float64) that would be upcasting lots of things, potentially
breaking a significant amount of existing code.

So, personally, I don't see a straightforward solution without
warning/error, that would be safe enough for programmers.

-=- Olivier

>
> Nathaniel asked:
>
>> But if this is something you're running into in practice then you may have a better idea than us about the practical effects. Do you have any examples where this has come up that you can share?
>
> The only time I really ran into the 1.5/1.6 change was some old code
> ported from IDL which did odd things with the wrapping behavior.  But
> what I'm really trying to get a handle on here is the proposed future
> behavior.  I am coming to this from the perspective of both a user and
> a library developer (h5py) trying to work out what if anything I have
> to do when handling arrays and values I get from users.
>
> Andrew


More information about the NumPy-Discussion mailing list