[Numpy-discussion] Regression: in-place operations (possibly intentional)

Charles R Harris charlesr.harris@gmail....
Tue Sep 18 15:42:10 CDT 2012


On Tue, Sep 18, 2012 at 2:33 PM, Travis Oliphant <travis@continuum.io>wrote:

>
> On Sep 18, 2012, at 2:44 PM, Charles R Harris wrote:
>
>
>
> On Tue, Sep 18, 2012 at 1:35 PM, Benjamin Root <ben.root@ou.edu> wrote:
>
>>
>>
>> On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris <
>> charlesr.harris@gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root <ben.root@ou.edu> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris <
>>>> charlesr.harris@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root <ben.root@ou.edu>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris <
>>>>>> charlesr.harris@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant <
>>>>>>> travis@continuum.io> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote:
>>>>>>>>
>>>>>>>> > Consider the following code:
>>>>>>>> >
>>>>>>>> > import numpy as np
>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>>>> > a *= float(255) / 15
>>>>>>>> >
>>>>>>>> > In v1.6.x, this yields:
>>>>>>>> > array([17, 34, 51, 68, 85], dtype=int16)
>>>>>>>> >
>>>>>>>> > But in master, this throws an exception about failing to cast via
>>>>>>>> same_kind.
>>>>>>>> >
>>>>>>>> > Note that numpy was smart about this operation before, consider:
>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>>>> > a *= float(128) / 256
>>>>>>>>
>>>>>>>> > yields:
>>>>>>>> > array([0, 1, 1, 2, 2], dtype=int16)
>>>>>>>> >
>>>>>>>> > Of course, this is different than if one does it in a
>>>>>>>> non-in-place manner:
>>>>>>>> > np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5
>>>>>>>> >
>>>>>>>> > which yields an array with floating point dtype in both versions.
>>>>>>>>  I can appreciate the arguments for preventing this kind of implicit
>>>>>>>> casting between non-same_kind dtypes, but I argue that because the
>>>>>>>> operation is in-place, then I (as the programmer) am explicitly stating
>>>>>>>> that I desire to utilize the current array to store the results of the
>>>>>>>> operation, dtype and all.  Obviously, we can't completely turn off this
>>>>>>>> rule (for example, an in-place addition between integer array and a
>>>>>>>> datetime64 makes no sense), but surely there is some sort of happy medium
>>>>>>>> that would allow these sort of operations to take place?
>>>>>>>> >
>>>>>>>> > Lastly, if it is determined that it is desirable to allow
>>>>>>>> in-place operations to continue working like they have before, I would like
>>>>>>>> to see such a fix in v1.7 because if it isn't in 1.7, then other libraries
>>>>>>>> (such as matplotlib, where this issue was first found) would have to change
>>>>>>>> their code anyway just to be compatible with numpy.
>>>>>>>>
>>>>>>>> I agree that in-place operations should allow different casting
>>>>>>>> rules.  There are different opinions on this, of course, but generally this
>>>>>>>> is how NumPy has worked in the past.
>>>>>>>>
>>>>>>>> We did decide to change the default casting rule to "same_kind" but
>>>>>>>> making an exception for in-place seems reasonable.
>>>>>>>>
>>>>>>>
>>>>>>> I think that in these cases same_kind will flag what are most likely
>>>>>>> programming errors and sloppy code. It is easy to be explicit and doing so
>>>>>>> will make the code more readable because it will be immediately obvious
>>>>>>> what the multiplicand is without the need to recall what the numpy casting
>>>>>>> rules are in this exceptional case. IISTR several mentions of this before
>>>>>>> (Gael?), and in some of those cases it turned out that bugs were being
>>>>>>> turned up. Catching bugs with minimal effort is a good thing.
>>>>>>>
>>>>>>> Chuck
>>>>>>>
>>>>>>>
>>>>>> True, it is quite likely to be a programming error, but then again,
>>>>>> there are many cases where it isn't.  Is the problem strictly that we are
>>>>>> trying to downcast the float to an int, or is it that we are trying to
>>>>>> downcast to a lower precision?  Is there a way for one to explicitly relax
>>>>>> the same_kind restriction?
>>>>>>
>>>>>
>>>>> I think the problem is down casting across kinds, with the result that
>>>>> floats are truncated and the imaginary parts of imaginaries might be
>>>>> discarded. That is, the value, not just the precision, of the rhs changes.
>>>>> So I'd favor an explicit cast in code like this, i.e., cast the rhs to an
>>>>> integer.
>>>>>
>>>>> It is true that this forces downstream to code up to a higher
>>>>> standard, but I don't see that as a bad thing, especially if it exposes
>>>>> bugs. And it isn't difficult to fix.
>>>>>
>>>>> Chuck
>>>>>
>>>>>
>>>> Mind you, in my case, casting the rhs as an integer before doing the
>>>> multiplication would be a bug, since our value for the rhs is usually
>>>> between zero and one.  Multiplying first by the integer numerator before
>>>> dividing by the integer denominator would likely cause issues with
>>>> overflowing the 16 bit integer.
>>>>
>>>>
>>> For the case in point I'd do
>>>
>>> In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>
>>> In [2]: a //= 2
>>>
>>> In [3]: a
>>> Out[3]: array([0, 1, 1, 2, 2], dtype=int16)
>>>
>>> Although I expect you would want something different in practice. But
>>> the current code already looks fragile to me and I think it is a good thing
>>> you are taking a closer look at it. If you really intend going through a
>>> float, then it should be something like
>>>
>>> a = (a*(float(128)/256)).astype(int16)
>>>
>>> Chuck
>>>
>>>
>> And thereby losing the memory benefit of an in-place multiplication?
>>
>
> What makes you think you are getting that? I'd have to check the numpy  C
> source, but I expect the multiplication is handled just as I wrote it out.
> I don't recall any loops that handle mixed types likes that. I'd like to
> see some, though, scaling integers is a common problem.
>
>
>
>
>> That is sort of the point of all this.  We are using 16 bit integers
>> because we wanted to be as efficient as possible and didn't need anything
>> larger.  Note, that is what we changed the code to, I am just wondering if
>> we are being too cautious.  The casting kwarg looks to be what I might
>> want, though it isn't as clean as just writing an "*=" statement.
>>
>>
> I think even there you will have an intermediate float array followed by a
> cast.
>
>
> This is true, but it is done in chunks of a fixed size (controllable by a
> thread-local variable or keyword argument to the ufunc).
>
> How difficult would it be to change in-place operations back to the
> "unsafe" default?
>

Probably not too difficult, but I think it would be a mistake. What keyword
argument are you referring to? In the current case, I think what is wanted
is a scaling function that will actually do things in place. The matplotlib
folks would probably be happier with the result if they simply coded up a
couple of small Cython routines to do that.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120918/6dc55608/attachment.html 


More information about the NumPy-Discussion mailing list