[Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
Travis Oliphant
travis@continuum...
Mon Feb 13 22:56:40 CST 2012
These are great suggestions. I am happy to start digging into the code. I'm also happy to re-visit any and all design decisions for NumPy 2.0 (with a strong-eye towards helping people migrate and documenting the results). Mark, I think you have done an excellent job of working with a stodgy group and pushing things forward. That is a rare talent, and the world is a better place because you jumped in.
There is a lot of cruft all over the place, I know. I also know a lot more now than I did 6 years ago about software design :-) I'm very excited about what we are going to be able to do with NumPy together --- and with the others in the community.
But, I am also aware of *a lot* of users who never voice their opinion on this list, and a lot of features that they want and need and are currently working around the limitations of NumPy to get. These are going to be my primary focus for the rest of the 1.X series. I see at least a NumPy 1.8 at this point with maybe even a NumPy 1.9. At the same time, I am looking forward to working with you and others in the community as you lead the push toward NumPy 2.0 (which I hope is not delayed too long with all the possible discussions that can take place :-) )
Best regards,
-Travis
On Feb 13, 2012, at 10:31 PM, Mark Wiebe wrote:
> On Mon, Feb 13, 2012 at 8:04 PM, Travis Oliphant <travis@continuum.io> wrote:
> I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system. If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage.
>
> It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however).
>
> Likely the only way we will be able to know for certain the extent to which our opinions are accurate is to actually dig into the code. I think we can agree, however, that at the very least it could use some performance improvement. :)
>
> I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced. NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests. It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do.
>
> I did put quite a bit of effort into maintaining compatibility, and was incredibly careful about the change we're discussing. I used something I suspect you created, the "can cast safely" table here:
>
> http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules
>
> I extended it to more cases including scalar/array combinations of type promotion, and validated that 1.5 and 1.6 produced the same outputs. The script I used is here:
>
> https://github.com/numpy/numpy/blob/master/numpy/testing/print_coercion_tables.py
>
> I definitely didn't jump into the change blind, but I did approach it from a clean perspective with the willingness to try and make things better. I understand this is a delicate balance to walk, and I'd like to stress that I didn't take any of the changes I made here lightly.
>
> Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing. Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately.
>
> Well, everything I did for 1.6 that we're discussing here was volunteer work too. :)
>
> You and Enthought have all the credit for the later bit where I did get paid a little bit to do the datetime64 and NA stuff!
>
> Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them.
>
> It's great to have you back and active in the community again too. I'm sure this is improving the moods of many NumPy and SciPy users.
>
> -Mark
>
>
> -Travis
>
>
>
> On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote:
>
>> I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests:
>>
>> https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345
>>
>> Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case.
>>
>> Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests.
>>
>> Thanks,
>> -Mark
>>
>> On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant <travis@continuum.io> wrote:
>> The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent.
>>
>> The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately.
>>
>> Another thing I noticed is that I thought that int16 <op> scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy.
>>
>> Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series.
>>
>> I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python.
>>
>> We will need to look in detail at what has changed. I will write a test to do that.
>>
>> Thanks,
>>
>> Travis
>>
>> --
>> Travis Oliphant
>> (on a mobile)
>> 512-826-7480
>>
>>
>> On Feb 13, 2012, at 7:58 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
>>
>>> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant <travis@continuum.io> wrote:
>>> Hmmm. This seems like a regression. The scalar casting API was fairly intentional.
>>>
>>> What is the reason for the change?
>>>
>>> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example "a+b" could result in a different type than "b+a". I recall there being some bugs in the tracker related to this as well, but I don't remember those details.
>>>
>>> This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned -> signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting.
>>>
>>> During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting "The ufunc uses a more consistent algorithm for loop selection.":
>>>
>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html
>>>
>>> The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page:
>>>
>>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html
>>>
>>> Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle.
>>>
>>> Cheers,
>>> Mark
>>>
>>> --
>>> Travis Oliphant
>>> (on a mobile)
>>> 512-826-7480
>>>
>>>
>>> On Feb 13, 2012, at 6:25 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>>
>>> > Hi,
>>> >
>>> > I recently noticed a change in the upcasting rules in numpy 1.6.0 /
>>> > 1.6.1 and I just wanted to check it was intentional.
>>> >
>>> > For all versions of numpy I've tested, we have:
>>> >
>>> >>>> import numpy as np
>>> >>>> Adata = np.array([127], dtype=np.int8)
>>> >>>> Bdata = np.int16(127)
>>> >>>> (Adata + Bdata).dtype
>>> > dtype('int8')
>>> >
>>> > That is - adding an integer scalar of a larger dtype does not result
>>> > in upcasting of the output dtype, if the data in the scalar type fits
>>> > in the smaller.
>>> >
>>> > For numpy < 1.6.0 we have this:
>>> >
>>> >>>> Bdata = np.int16(128)
>>> >>>> (Adata + Bdata).dtype
>>> > dtype('int8')
>>> >
>>> > That is - even if the data in the scalar does not fit in the dtype of
>>> > the array to which it is being added, there is no upcasting.
>>> >
>>> > For numpy >= 1.6.0 we have this:
>>> >
>>> >>>> Bdata = np.int16(128)
>>> >>>> (Adata + Bdata).dtype
>>> > dtype('int16')
>>> >
>>> > There is upcasting...
>>> >
>>> > I can see why the numpy 1.6.0 way might be preferable but it is an API
>>> > change I suppose.
>>> >
>>> > Best,
>>> >
>>> > Matthew
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120213/7833a544/attachment-0001.html
More information about the NumPy-Discussion
mailing list