[Numpy-discussion] Forbidden charcter in the "names" argument of genfromtxt?

Skipper Seabold jsseabold@gmail....
Mon Feb 20 12:58:53 CST 2012


On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <brett.olsen@gmail.com> wrote:
> On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams@gwmail.gwu.edu> wrote:
>> Hey everyone,
>>
>> I have timeseries data in which the column label is simply a filename from
>> which the original data was taken.  Here's some sample data:
>>
>> name1.txt  name2.txt  name3.txt
>> 32              34            953
>> 32              03            402
>>
>> I've noticed that the standard genfromtxt() method works great; however, the
>> names aren't written correctly.  That is, if I use the command:
>>
>> print data['name1.txt']
>>
>> Nothing happens.
>>
>> However, when I remove the file extension, Eg:
>>
>> name1  name2  name3
>> 32              34            953
>> 32              03            402
>>
>> Then print data['name1'] return (32, 32) as expected.  It seems that the
>> period in the name isn't compatible with the genfromtxt() names attribute.
>> Is there a workaround, or do I need to restructure my program to get the
>> extension removed?  I'd rather not do this if possible for reasons that
>> aren't important for the discussion at hand.
>
> It looks like the period is just getting stripped out of the names:
>
> In [1]: import numpy as N
>
> In [2]: N.genfromtxt('sample.txt', names=True)
> Out[2]:
> array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
>      dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
>
> Interestingly, this still happens if you supply the names manually:
>
> In [17]: def reader(filename):
>   ....:     infile = open(filename, 'r')
>   ....:     names = infile.readline().split()
>   ....:     data = N.genfromtxt(infile, names=names)
>   ....:     infile.close()
>   ....:     return data
>   ....:
>
> In [20]: data = reader('sample.txt')
>
> In [21]: data
> Out[21]:
> array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
>      dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
>
> What you can do is reset the names after genfromtxt is through with it, though:
>
> In [34]: def reader(filename):
>   ....:     infile = open(filename, 'r')
>   ....:     names = infile.readline().split()
>   ....:     infile.close()
>   ....:     data = N.genfromtxt(filename, names=True)
>   ....:     data.dtype.names = names
>   ....:     return data
>   ....:
>
> In [35]: data = reader('sample.txt')
>
> In [36]: data
> Out[36]:
> array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
>      dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')])
>
> Be warned, I don't know why the period is getting stripped; there may
> be a good reason, and adding it in might cause problems.

I think the period is stripped because recarrays also offer attribute
access of names. So you wouldn't be able to do

your_array.sample.txt

All the names get passed through a name validator. IIRC it's something like

from numpy.lib import _iotools

validator = _iotools.NameValidator()

validator.validate('sample1.txt')
validator.validate('a name with spaces')

NameValidator has a good docstring and the gist of this should be in
the genfromtxt docs, if it's not already.

Skipper


More information about the NumPy-Discussion mailing list