[Numpy-discussion] Bytes vs. Unicode in Python3
Pauli Virtanen
pav@iki...
Thu Nov 26 17:08:18 CST 2009
Hi,
The Python 3 porting needs some decisions on what is Bytes and
what is Unicode.
I'm currently taking the following approach. Comments?
***
dtype field names
Either Bytes or Unicode.
But 'a' and b'a' are *different* fields.
The issue is that:
Python 2: {'a': 2}[u'a'] == 2, {u'a': 2}['a'] == 2
Python 3: {'a': 2}[b'a'], {b'a': 2}['a'] raise exceptions
so the current assumptions in the C code of u'a' == b'a'
cease to hold.
dtype titles
If Bytes or Unicode, work similarly as field names.
dtype format strings, datetime tuple, and any other "protocol" strings
Bytes. User can pass in Unicode, but it's converted using
UTF8 codec.
This will likely change repr() of various objects. Acceptable?
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list