[Numpy-discussion] Bytes vs. Unicode in Python3

Pauli Virtanen pav@iki...
Thu Nov 26 17:08:18 CST 2009


The Python 3 porting needs some decisions on what is Bytes and
what is Unicode.

I'm currently taking the following approach. Comments?


dtype field names

        Either Bytes or Unicode.
        But 'a' and b'a' are *different* fields.

        The issue is that:
            Python 2: {'a': 2}[u'a'] == 2, {u'a': 2}['a'] == 2
            Python 3: {'a': 2}[b'a'], {b'a': 2}['a'] raise exceptions
        so the current assumptions in the C code of u'a' == b'a'
        cease to hold.

dtype titles

        If Bytes or Unicode, work similarly as field names.

dtype format strings, datetime tuple, and any other "protocol" strings

        Bytes. User can pass in Unicode, but it's converted using
        UTF8 codec.

        This will likely change repr() of various objects. Acceptable?

Pauli Virtanen

