[Numpy-discussion] Extent of unicode types in numpy

Tim Hochberg tim.hochberg at cox.net
Tue Feb 7 10:34:09 CST 2006


Eric Firing wrote:

> Francesc, Travis,
>
> Francesc Altet wrote:
> [...]
>
>> All in all, my opinion is that allowing the coexistence of different
>> sizes of unicode types in numpy would be a receipt for disaster when
>> one wants to transport unicode characters between platforms with
>> python interpreters compiled with different unicode sizes.
>
>
> I agree--it would be a nightmare.
>
>
>> Anyway, I don't know if the recommendation of compiling Python with
>> UCS4 is spread enough or not in the different distributions, but
>> people can easily check this with:
>>
>>
>>>>> len(buffer(u"u"))
>>>>
>>
>> 4
>>
>> if the output of this is 4 (as in my example), then the interpreter is
>> using UCS4; if it is 2, it is using UCS2.
>
>
> No, it is not sufficiently widespread; Mandriva 2006 python is 
> compiled for UCS2.

Also the default build for MS Windows is compiled for UCS2.

How about always storing data as UCS4 and converting it on the fly to 
UCS2 when extracting a python string from the array, if on a UCS2 python 
build. Isn't converting to UCS2 simply a matter of lopping off the top 
two bytes? If so, converting it should be simply a check that the value 
is not out of range, followed by the aforementioned lopping.

-tim








More information about the Numpy-discussion mailing list