[Numpy-discussion] Bytes vs. Unicode in Python3

Charles R Harris charlesr.harris@gmail....
Thu Nov 26 18:37:43 CST 2009

Hi Pauli,

On Thu, Nov 26, 2009 at 4:08 PM, Pauli Virtanen <pav@iki.fi> wrote:

> Hi,
> The Python 3 porting needs some decisions on what is Bytes and
> what is Unicode.
> I'm currently taking the following approach. Comments?
>        ***
> dtype field names
>        Either Bytes or Unicode.
>        But 'a' and b'a' are *different* fields.
>        The issue is that:
>            Python 2: {'a': 2}[u'a'] == 2, {u'a': 2}['a'] == 2
>            Python 3: {'a': 2}[b'a'], {b'a': 2}['a'] raise exceptions
>        so the current assumptions in the C code of u'a' == b'a'
>        cease to hold.
> dtype titles
>        If Bytes or Unicode, work similarly as field names.
> dtype format strings, datetime tuple, and any other "protocol" strings
>        Bytes. User can pass in Unicode, but it's converted using
>        UTF8 codec.
>        This will likely change repr() of various objects. Acceptable?
I'm not clear on your recommendation here, is it that we should use bytes,
with unicode converted to UTF8? Will that support arrays that have been
pickled and such? Or will we just have a minimum of code to fix up? And
could you expand on the changes that repr() might undergo?

Mind, I think using bytes sounds best, but I haven't looked into the whole
strings part of the transition and don't have an informed opinion on the

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20091126/be96c24f/attachment-0001.html 

More information about the NumPy-Discussion mailing list