[Numpy-discussion] Status of NumPy and Python 3.3
Christoph Gohlke
cgohlke@uci....
Sat Jul 28 20:17:04 CDT 2012
On 7/28/2012 6:09 PM, Ondřej Čertík wrote:
> On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:
>> On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:
>>> On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.certik@gmail.com> wrote:
>>>> Many of the failures in
>>>> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
>>>> are of the type:
>>>>
>>>> ======================================================================
>>>> FAIL: Check byteorder of single-dimensional objects
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>> File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
>>>> line 286, in test_valuesSD
>>>> self.assertTrue(ua[0] != ua2[0])
>>>> AssertionError: False is not true
>>>>
>>>>
>>>> and those are caused by the following minimal example:
>>>>
>>>> Python 3.2:
>>>>
>>>>>>> from numpy import array
>>>>>>> a = array(["abc"])
>>>>>>> b = a.newbyteorder()
>>>>>>> a.dtype
>>>> dtype('<U3')
>>>>>>> b.dtype
>>>> dtype('>U3')
>>>>>>> a[0].dtype
>>>> dtype('<U3')
>>>>>>> b[0].dtype
>>>> dtype('<U6')
>>>>>>> a[0] == b[0]
>>>> False
>>>>>>> a[0]
>>>> 'abc'
>>>>>>> b[0]
>>>> 'ៀ\udc00埀\udc00韀\udc00'
>>>>
>>>>
>>>> Python 3.3:
>>>>
>>>>
>>>>>>> from numpy import array
>>>>>>> a = array(["abc"])
>>>>>>> b = a.newbyteorder()
>>>>>>> a.dtype
>>>> dtype('<U3')
>>>>>>> b.dtype
>>>> dtype('>U3')
>>>>>>> a[0].dtype
>>>> dtype('<U3')
>>>>>>> b[0].dtype
>>>> dtype('<U3')
>>>>>>> a[0] == b[0]
>>>> True
>>>>>>> a[0]
>>>> 'abc'
>>>>>>> b[0]
>>>> 'abc'
>>>>
>>>>
>>>> So somehow the newbyteorder() method doesn't change the dtype of the
>>>> elements in our new code.
>>>> This method is implemented in numpy/core/src/multiarray/descriptor.c
>>>> (I think), but so far I don't see
>>>> where the problem could be.
>>>>
>>>> Any ideas?
>>>
>>> Ok, after some investigating, I think we need to do something along these lines:
>>>
>>> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
>>> index c134aed..daf7fc4 100644
>>> --- a/numpy/core/src/multiarray/scalarapi.c
>>> +++ b/numpy/core/src/multiarray/scalarapi.c
>>> @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
>>> #if PY_VERSION_HEX >= 0x03030000
>>> if (type_num == NPY_UNICODE) {
>>> PyObject *b, *args;
>>> - b = PyBytes_FromStringAndSize(data, itemsize);
>>> + if (swap) {
>>> + char *buffer;
>>> + buffer = malloc(itemsize);
>>> + if (buffer == NULL) {
>>> + PyErr_NoMemory();
>>> + }
>>> + memcpy(buffer, data, itemsize);
>>> + byte_swap_vector(buffer, itemsize, 4);
>>> + b = PyBytes_FromStringAndSize(buffer, itemsize);
>>> + // We have to deallocate this later, otherwise we get a segfault...
>>> + //free(buffer);
>>> + } else {
>>> + b = PyBytes_FromStringAndSize(data, itemsize);
>>> + }
>>> if (b == NULL) {
>>> return NULL;
>>> }
>>>
>>> This particular implementation still fails though:
>>>
>>>
>>>>>> from numpy import array
>>>>>> a = array(["abc"])
>>>>>> b = a.newbyteorder()
>>>>>> a.dtype
>>> dtype('<U3')
>>>>>> b.dtype
>>> dtype('>U3')
>>>>>> a[0].dtype
>>> dtype('<U3')
>>>>>> b[0].dtype
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>> codepoint not in range(0x110000)
>>>>>> a[0] == b[0]
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>> codepoint not in range(0x110000)
>>>>>> a[0]
>>> 'abc'
>>>>>> b[0]
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>> codepoint not in range(0x110000)
>>>
>>>
>>>
>>> But I think that we simply need to take into account the "swap" flag.
>>
>> Ok, so first of all, I tried to disable the swapping in Python 3.2:
>>
>> if (swap) {
>> byte_swap_vector(buffer, itemsize >> 2, 4);
>> }
>>
>> And then it behaves *exactly* as in Python 3.3. So I am pretty sure
>> that the problem is right there and something
>> along the lines of my patch above should fix it. I had a few bugs
>> there, here is the correct version:
>>
>> diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
>> index c134aed..bed73f7 100644
>> --- a/numpy/core/src/multiarray/scalarapi.c
>> +++ b/numpy/core/src/multiarray/scalarapi.c
>> @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
>> #if PY_VERSION_HEX >= 0x03030000
>> if (type_num == NPY_UNICODE) {
>> PyObject *b, *args;
>> - b = PyBytes_FromStringAndSize(data, itemsize);
>> + if (swap) {
>> + char *buffer;
>> + buffer = malloc(itemsize);
>> + if (buffer == NULL) {
>> + PyErr_NoMemory();
>> + }
>> + memcpy(buffer, data, itemsize);
>> + byte_swap_vector(buffer, itemsize >> 2, 4);
>> + b = PyBytes_FromStringAndSize(buffer, itemsize);
>> + free(buffer);
>> + } else {
>> + b = PyBytes_FromStringAndSize(data, itemsize);
>> + }
>> if (b == NULL) {
>> return NULL;
>> }
>>
>>
>> That works well, except that it gives the UnicodeDecodeError:
>>
>>>>> b[0].dtype
>> NULL
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>> codepoint not in range(0x110000)
>>
>> This error is actually triggered by this line:
>>
>>
>> obj = type->tp_new(type, args, NULL);
>>
>> in the patch by Stefan above. So I think what is happening is that it
>> simply tries to convert it from bytes
>> to a string and fails. That makes great sense. The question is why
>> doesn't it fail in exactly the same way
>> in Python 3.2? I think it's because the conversion check is bypassed
>> somehow. Stefan, I think
>> we need to swap it after the object is created. I am still
>> experimenting with this.
>
> Well, I simply went to the Python sources and then implemented a
> solution that works with this patch:
>
> https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654
>
> So now the PR actually seems to work. The rest of the failures are here:
>
> https://gist.github.com/3195520
>
> and they seem to be unrelated. Can somebody please review this PR?
>
> https://github.com/numpy/numpy/pull/366
>
>
> I will squash the commits after it's reviewed (I want to keep the
> history there for now).
>
>
> Ondrej
Thank you. I backported the PR to numpy 1.6.2 and it works for me on
win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures
of the kind:
AssertionError:
Items are not equal:
ACTUAL: ()
DESIRED: None
Christoph
More information about the NumPy-Discussion
mailing list