[Numpy-discussion] bug in genfromtxt for python 3.2

Ralf Gommers ralf.gommers@googlemail....
Wed Mar 30 13:12:18 CDT 2011

On Wed, Mar 30, 2011 at 7:37 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
> Hi,
> On Wed, Mar 30, 2011 at 10:02 AM, Ralf Gommers
> <ralf.gommers@googlemail.com> wrote:
>> On Wed, Mar 30, 2011 at 3:39 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
>>> Hi,
>>> On Mon, Mar 28, 2011 at 11:29 PM,  <josef.pktd@gmail.com> wrote:
>>>> numpy/lib/test_io.py    only uses StringIO in the test, no actual csv file
>>>> If I give the filename than I get a  TypeError: Can't convert 'bytes'
>>>> object to str implicitly
>>>> from the statsmodels mailing list example
>>>>>>>> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float)
>>>>> Traceback (most recent call last):
>>>>>  File "<pyshell#30>", line 1, in <module>
>>>>>    data = recfromtxt(open('./star98.csv', "U"), delimiter=",",
>>>>> skip_header=1, dtype=float)
>>>>>  File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py",
>>>>> line 1633, in recfromtxt
>>>>>    output = genfromtxt(fname, **kwargs)
>>>>>  File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py",
>>>>> line 1181, in genfromtxt
>>>>>    first_values = split_line(first_line)
>>>>>  File "C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py",
>>>>> line 206, in _delimited_splitter
>>>>>    line = line.split(self.comments)[0].strip(asbytes(" \r\n"))
>>>>> TypeError: Can't convert 'bytes' object to str implicitly
>>> Is the right fix for this to open a 'filename' passed to genfromtxt,
>>> as 'binary' (bytes)?
>>> If so I will submit a pull request with a fix and a test,
>> Seems to work and is what was intended I think, see Pauli's
>> changes/notes in commit 0f2e7db0.
>> This is ticket #1607 by the way.
> Thanks for making a ticket.  I've submitted a pull request for the fix
> and linked to it from the ticket.
> The reason I asked whether this was the correct fix was:
> imagine I'm working with a non-latin default encoding, and I've opened a file:
> fobj = open('my_nonlatin.txt', 'rt')
> in python 3.2.  That might contain numbers and non-latin text.   I
> can't pass that into 'genfromtxt' because it will give me this error
> above.  I can pass it is as binary but then I'll get garbled text.

I admit the string/bytes thing is still a little confusing to me, but
isn't that always going to be a problem (even with python 2.x)?
There's no way for genfromtxt to know what the encoding of an
arbitrary file is. So your choices are garbled text or an error.
Garbled text is better.

It may help to explicitly say in the docstring that this is an ASCII
routine (as it does in the source code).


> Should those functions also allow unicode-providing files (perhaps
> with binary as default for speed)?

More information about the NumPy-Discussion mailing list