[Numpy-discussion] Problem with importing numpy in Ubuntu

Fernando Perez fperez.net@gmail....
Wed Jul 28 14:36:46 CDT 2010


On Tue, Jul 27, 2010 at 7:42 AM, Sebastian Haase <seb.haase@gmail.com> wrote:
> The origin of this problem is the fact that Python supports (at least)
> 2 types of Unicode:
> 2 bytes and/or 4 bytes per character.

It only supports those two, and that's purely an internal
implementation detail.  Python can encode unicode in many encodings,
but *internally* it has to have some representation of its own, and it
can use ucs2 or ucs4.  Which one to use is a compile-time flag:

  --enable-unicode[=ucs[24]]

> Additionally, for some incomprehensible reason the Python source code
> (as downloaded from python.org) defaults to 2ByteUnicode whereas
> all (major) Linux distributions default to 4ByteUnicode.....

The reason is that many systems (Java, Windows, Qt natively -
http://en.wikipedia.org/wiki/Utf-16#Use_in_major_operating_systems_and_environments)
use utf-16 as their native encoding, and ucs2 is a subset of utf-16,
so in many environments that makes interoperability easier.  But ucs2
can not encode all of unicode, while ucs4 can, so Linux distributions
choose to use ucs4 as their internal encoding to ensure that all
unicode code points can be encoded in python.

This email from Guido explains his position on leaving the ucs2/4
choice up to packagers:

http://mail.python.org/pipermail/python-dev/2008-July/080892.html

The official Python 2.x unicode story is well explained here:
http://docs.python.org/howto/unicode.html

and here is the corresponding document for 3.x:
http://docs.python.org/release/3.1.2/howto/unicode.html

Joel Spolsky has a very nice introduction to the main ideas behind unicode:
http://www.joelonsoftware.com/articles/Unicode.html

and Matthew Brett has a nice and more concise set of notes on the matter:

https://cirl.berkeley.edu/mb312/pydagogue/introducing_unicode.html
https://cirl.berkeley.edu/mb312/pydagogue/python_unicode.html


I should note that anyone who is thinking of porting any non-trivial
amount of code from python 2.x to 3.x will save a lot of time and
frustration by spending just a couple of hours reading and
understanding the above.  It's not that much work, and if you don't
understand how Python thinks of strings, you're very likely to make a
painful mess in such a code transition effort.  I know that the few
hours I put into reading the above have already paid off tremendously
for us with the zeromq/ipython codebase.

Cheers,

f


More information about the NumPy-Discussion mailing list