[Numpy-discussion] bug in genfromtxt for python 3.2
Wed Mar 30 14:48:18 CDT 2011
On Wed, Mar 30, 2011 at 11:32 AM, Pauli Virtanen <firstname.lastname@example.org> wrote:
> On Wed, 30 Mar 2011 10:37:45 -0700, Matthew Brett wrote:
>> imagine I'm working with a non-latin default encoding, and I've opened a
>> fobj = open('my_nonlatin.txt', 'rt')
>> in python 3.2. That might contain numbers and non-latin text. I can't
>> pass that into 'genfromtxt' because it will give me this error above. I
>> can pass it is as binary but then I'll get garbled text.
> That's the way it also works on Python 2. The text is not garbled -- it's
> just in some binary representation that you can later on decode to
> Granted, utf-16 and the ilk might be problematic.
>> Should those functions also allow unicode-providing files (perhaps with
>> binary as default for speed)?
> Nobody has yet asked for this feature as far as I know, so I guess the
> need for it is pretty low.
> Personally, I don't think going unicode makes much sense here. First, it
> would be a Py3-only feature. Second, there is a real need for it only
> when dealing with multibyte encodings, which are seldom used these days
> with utf-8 rightfully dominating.
It's not a feature I need, but then, I'm afraid all the languages I've
been taught are latin-1. Oh, except I learnt a tiny bit of Greek.
But I don't use it for work :)
I suppose the annoyances would be:
1) Probably temporary surprise that genfromtxt(open('my_file.txt',
'rt')) generates this error
2) Having to go back over returned arrays decoding stuff for utf-8
3) Wrong results for other encodings
Maybe the best way is a graceful warning on entry to the routine?
More information about the NumPy-Discussion