[Numpy-discussion] Regression: in-place operations (possibly intentional)

Benjamin Root ben.root@ou....
Tue Sep 18 14:35:10 CDT 2012


On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris <charlesr.harris@gmail.com
> wrote:

>
>
> On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root <ben.root@ou.edu> wrote:
>
>>
>>
>> On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris <
>> charlesr.harris@gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root <ben.root@ou.edu> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris <
>>>> charlesr.harris@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant <travis@continuum.io>wrote:
>>>>>
>>>>>>
>>>>>> On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote:
>>>>>>
>>>>>> > Consider the following code:
>>>>>> >
>>>>>> > import numpy as np
>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>> > a *= float(255) / 15
>>>>>> >
>>>>>> > In v1.6.x, this yields:
>>>>>> > array([17, 34, 51, 68, 85], dtype=int16)
>>>>>> >
>>>>>> > But in master, this throws an exception about failing to cast via
>>>>>> same_kind.
>>>>>> >
>>>>>> > Note that numpy was smart about this operation before, consider:
>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>> > a *= float(128) / 256
>>>>>>
>>>>>> > yields:
>>>>>> > array([0, 1, 1, 2, 2], dtype=int16)
>>>>>> >
>>>>>> > Of course, this is different than if one does it in a non-in-place
>>>>>> manner:
>>>>>> > np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5
>>>>>> >
>>>>>> > which yields an array with floating point dtype in both versions.
>>>>>>  I can appreciate the arguments for preventing this kind of implicit
>>>>>> casting between non-same_kind dtypes, but I argue that because the
>>>>>> operation is in-place, then I (as the programmer) am explicitly stating
>>>>>> that I desire to utilize the current array to store the results of the
>>>>>> operation, dtype and all.  Obviously, we can't completely turn off this
>>>>>> rule (for example, an in-place addition between integer array and a
>>>>>> datetime64 makes no sense), but surely there is some sort of happy medium
>>>>>> that would allow these sort of operations to take place?
>>>>>> >
>>>>>> > Lastly, if it is determined that it is desirable to allow in-place
>>>>>> operations to continue working like they have before, I would like to see
>>>>>> such a fix in v1.7 because if it isn't in 1.7, then other libraries (such
>>>>>> as matplotlib, where this issue was first found) would have to change their
>>>>>> code anyway just to be compatible with numpy.
>>>>>>
>>>>>> I agree that in-place operations should allow different casting
>>>>>> rules.  There are different opinions on this, of course, but generally this
>>>>>> is how NumPy has worked in the past.
>>>>>>
>>>>>> We did decide to change the default casting rule to "same_kind" but
>>>>>> making an exception for in-place seems reasonable.
>>>>>>
>>>>>
>>>>> I think that in these cases same_kind will flag what are most likely
>>>>> programming errors and sloppy code. It is easy to be explicit and doing so
>>>>> will make the code more readable because it will be immediately obvious
>>>>> what the multiplicand is without the need to recall what the numpy casting
>>>>> rules are in this exceptional case. IISTR several mentions of this before
>>>>> (Gael?), and in some of those cases it turned out that bugs were being
>>>>> turned up. Catching bugs with minimal effort is a good thing.
>>>>>
>>>>> Chuck
>>>>>
>>>>>
>>>> True, it is quite likely to be a programming error, but then again,
>>>> there are many cases where it isn't.  Is the problem strictly that we are
>>>> trying to downcast the float to an int, or is it that we are trying to
>>>> downcast to a lower precision?  Is there a way for one to explicitly relax
>>>> the same_kind restriction?
>>>>
>>>
>>> I think the problem is down casting across kinds, with the result that
>>> floats are truncated and the imaginary parts of imaginaries might be
>>> discarded. That is, the value, not just the precision, of the rhs changes.
>>> So I'd favor an explicit cast in code like this, i.e., cast the rhs to an
>>> integer.
>>>
>>> It is true that this forces downstream to code up to a higher standard,
>>> but I don't see that as a bad thing, especially if it exposes bugs. And it
>>> isn't difficult to fix.
>>>
>>> Chuck
>>>
>>>
>> Mind you, in my case, casting the rhs as an integer before doing the
>> multiplication would be a bug, since our value for the rhs is usually
>> between zero and one.  Multiplying first by the integer numerator before
>> dividing by the integer denominator would likely cause issues with
>> overflowing the 16 bit integer.
>>
>>
> For the case in point I'd do
>
> In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>
> In [2]: a //= 2
>
> In [3]: a
> Out[3]: array([0, 1, 1, 2, 2], dtype=int16)
>
> Although I expect you would want something different in practice. But the
> current code already looks fragile to me and I think it is a good thing you
> are taking a closer look at it. If you really intend going through a float,
> then it should be something like
>
> a = (a*(float(128)/256)).astype(int16)
>
> Chuck
>
>
And thereby losing the memory benefit of an in-place multiplication?  That
is sort of the point of all this.  We are using 16 bit integers because we
wanted to be as efficient as possible and didn't need anything larger.
Note, that is what we changed the code to, I am just wondering if we are
being too cautious.  The casting kwarg looks to be what I might want,
though it isn't as clean as just writing an "*=" statement.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120918/fc7f5d8e/attachment.html 


More information about the NumPy-Discussion mailing list