[Numpy-discussion] Status of NumPy and Python 3.3

Ondřej Čertík ondrej.certik@gmail....
Sat Jul 28 09:58:57 CDT 2012


On Sat, Jul 28, 2012 at 2:36 AM, Stefan Krah <stefan-usenet@bytereef.org> wrote:
> Ond??ej ??ert??k <ondrej.certik@gmail.com> wrote:
>> >> I took a brief look at it, and from the errors I have seen, one is
>> >> cosmetic, the other one is a bit more involved (rewriting
>> >> PyArray_Scalar unicode support). While it is not difficult in nature,
>> >> the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it
>> >> would require multiple configurations on multiple python versions to
>> >> be tested.
> The cleanest way might be to leave the existing code in place and write
> completely new and independent code for Python 3.3.
>> https://github.com/numpy/numpy/pull/366
>> It's a work in progress, I am still have some little issues, see the
>> PR for up-to-date details.
> I'm not a Unicode expert, but I think it's best to avoid Py_UNICODE altogether.

I think so too.

> What should matter in 3.3 is the maximum character in a Unicode string that
> determines the kind of the string:
>    PyUnicode_1BYTE_KIND  ->  Py_UCS1
>    PyUnicode_2BYTE_KIND  ->  Py_UCS2
>    PyUnicode_4BYTE_KIND  ->  Py_UCS4
> So Py_UNICODE_WIDE should not matter as all builds support PyUnicode_4BYTE_KIND.
> That's why I /think/ it's possible to drop Py_UNICODE altogether. For instance,
> the line in https://github.com/certik/numpy/commit/d02e36e5c85d5ee444614254643037aafc8deccc
> should probably be:
>   itemsize = PyUnicode_GetLength(robj) * PyUnicode_KIND(robj)

Yes, I think that's it. I've changed it and pushed in the change into the PR.

I am now seeing failures like these:

ERROR: test_rmul (test_defchararray.TestOperations)
Traceback (most recent call last):
  File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_defchararray.py",
line 592, in test_rmul
    Ar = np.array([[A[0,0]*r, A[0,1]*r],
  File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/defchararray.py",
line 1916, in __getitem__
    if issubclass(val.dtype.type, character) and not _len(val) == 0:
AttributeError: 'str' object has no attribute 'dtype'

Here is the code in defchararray.py:

1911 	        if not _globalvar and self.dtype.char not in 'SUbc':
1912 	            raise ValueError("Can only create a chararray from
string data.")
1914 	    def __getitem__(self, obj):
1915 	        val = ndarray.__getitem__(self, obj)
1916 ->	        if issubclass(val.dtype.type, character) and not _len(val) == 0:
1917 	            temp = val.rstrip()
1918 	            if _len(temp) == 0:
1919 	                val = ''
1920 	            else:
1921 	                val = temp

and here is some debugging info:

(Pdb) p self
(Pdb) p obj
(0, 0)
(Pdb) p val
(Pdb) p type(val)
<class 'str'>

So "val" is a Python string, which of course doesn't have .dtype. What
I don't understand yet is why

val = ndarray.__getitem__(self, obj)

returns a Python string. I've been debugging it for a few hours
yesterday, but so far no luck.

Then there are failures in the test_unicode.py of the following type:

FAIL: Check byteorder of single-dimensional objects
Traceback (most recent call last):
  File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
line 286, in test_valuesSD
    self.assertTrue(ua[0] != ua2[0])
AssertionError: False is not true

I didn't dig into those yet.

If anyone has any ideas, let me know.


