[Numpy-discussion] Extent of unicode types in numpy
faltet at carabos.com
Mon Feb 6 10:25:07 CST 2006
I'm a bit surprised by the fact that unicode types are the only ones
breaking the rule that must be specified with a different number of
bytes than it really takes. For example:
Out:dtype('<U64') # !!!!
which can quickly led to problems in users' code.
I think that, for the sake of consistency and exactly like the user must
know that a c16 is a complex taking 16 octets, he must know that a
unicode character should take 4 bytes. With this, we should have:
and forbid unicode character length that are not multiple of 4. I know
that, initially, it would be a bit strange for the user to specify 'S4'
for a string with 4 chars and 'U16' for an unicode string of 4 chars as
well, but hopefully he would be used soon to this.
The only problem with that I see with what I'm proposing is that I don't
know whether the unicode would take always 4-bytes in all the platforms
(--> 64-bit issues?). OTOH, I thought that Python would represent
internally unicode strings with 16-bit chars. Oh well, I'm bit lost on
this. Anybody can bring some light?
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
More information about the Numpy-discussion