[Numpy-discussion] PyArray_Scalar() and Unicode
Pauli Virtanen
pav@iki...
Sun Jun 13 07:55:22 CDT 2010
Sat, 12 Jun 2010 17:33:13 -0700, Dan Roberts wrote:
[clip: refactoring PyArray_Scalar]
> There are a few problems with this. The biggest problem for me is
> that it appears PyUCS2Buffer_FromUCS4() doesn't produce UCS2 at all, but
> rather UTF-16 since it produces surrogate pairs for code points above
> 0xFFFF. My first question is: is there any time when the data produced
> by PyUCS2Buffer_FromUCS4() wouldn't be parseable by a standards
> compliant UTF-16 decoder?
Since UTF-16 = UCS-2 + surrogate pairs, as far as I know, the data
produced should always be parseable by DecodeUTF16.
Conversion to real UCS-2 from UCS-4 would be a lossy procedure, since not
all code points can be represented with 2 bytes.
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list