[Numpy-discussion] bug in genfromtxt for python 3.2
Pauli Virtanen
pav@iki...
Wed Mar 30 13:32:34 CDT 2011
On Wed, 30 Mar 2011 10:37:45 -0700, Matthew Brett wrote:
[clip]
> imagine I'm working with a non-latin default encoding, and I've opened a
> file:
>
> fobj = open('my_nonlatin.txt', 'rt')
>
> in python 3.2. That might contain numbers and non-latin text. I can't
> pass that into 'genfromtxt' because it will give me this error above. I
> can pass it is as binary but then I'll get garbled text.
That's the way it also works on Python 2. The text is not garbled -- it's
just in some binary representation that you can later on decode to
unicode:
>>> np.array(['asd']).view(np.chararray).decode('utf-8')
array([u'asd'],
dtype='<U3')
Granted, utf-16 and the ilk might be problematic.
> Should those functions also allow unicode-providing files (perhaps with
> binary as default for speed)?
Nobody has yet asked for this feature as far as I know, so I guess the
need for it is pretty low.
Personally, I don't think going unicode makes much sense here. First, it
would be a Py3-only feature. Second, there is a real need for it only
when dealing with multibyte encodings, which are seldom used these days
with utf-8 rightfully dominating.
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list