[Numpy-discussion] Bytes vs. Unicode in Python3
Pauli Virtanen
pav+sp@iki...
Thu Dec 3 03:36:09 CST 2009
Fri, 27 Nov 2009 23:19:58 +0100, Dag Sverre Seljebotn wrote:
[clip]
> One thing to keep in mind here is that PEP 3118 actually defines a
> standard dtype format string, which is (mostly) incompatible with
> NumPy's. It should probably be supported as well when PEP 3118 is
> implemented.
PEP 3118 is for the most part implemented in my Py3K branch now -- it was
not actually much work, as I could steal most of the format string
converter from numpy.pxd.
Some questions:
How hard do we want to try supplying a buffer? Eg. if the consumer does
not specify strided but specifies suboffsets, should we try to compute
suitable suboffsets? Should we try making contiguous copies of the data
(I guess this would break buffer semantics?)?
> Just something to keep in the back of ones mind when discussing this.
> For instance one could, instead of inventing something new, adopt the
> characters PEP 3118 uses (if there isn't a conflict):
>
> - b: Raw byte
> - c: ucs-1 encoding (latin 1, one byte)
> - u: ucs-2 encoding, two bytes
> - w: ucs-4 encoding, four bytes
The 'b' character is already taken so we can't easily use that. 'y' would
be free for bYtes, however.
> Long-term I hope the NumPy-specific format string will be deprecated, so
> that repr print out the PEP 3118 format string etc. But, I'm aware that
> API breakage shouldn't happen when porting to Python 3.
Agreed.
A global switch could in principle be added for this, maybe -- the type
codes are for the most part stored in a dict in numerictypes.py and could
probably be easily replaced runtime.
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list