[Numpy-discussion] bug in genfromtxt for python 3.2

Pauli Virtanen pav@iki...
Wed Mar 30 13:32:34 CDT 2011


On Wed, 30 Mar 2011 10:37:45 -0700, Matthew Brett wrote:
[clip]
> imagine I'm working with a non-latin default encoding, and I've opened a
> file:
> 
> fobj = open('my_nonlatin.txt', 'rt')
> 
> in python 3.2.  That might contain numbers and non-latin text.   I can't
> pass that into 'genfromtxt' because it will give me this error above.  I
> can pass it is as binary but then I'll get garbled text.

That's the way it also works on Python 2. The text is not garbled -- it's 
just in some binary representation that you can later on decode to 
unicode:

>>> np.array(['asd']).view(np.chararray).decode('utf-8')
array([u'asd'], 
      dtype='<U3')

Granted, utf-16 and the ilk might be problematic.

> Should those functions also allow unicode-providing files (perhaps with
> binary as default for speed)?

Nobody has yet asked for this feature as far as I know, so I guess the 
need for it is pretty low.

Personally, I don't think going unicode makes much sense here. First, it 
would be a Py3-only feature. Second, there is a real need for it only 
when dealing with multibyte encodings, which are seldom used these days 
with utf-8 rightfully dominating.

-- 
Pauli Virtanen



More information about the NumPy-Discussion mailing list