[Numpy-discussion] Forbidden charcter in the "names" argument of genfromtxt?
Adam Hughes
hugadams@gwmail.gwu....
Mon Feb 20 13:02:16 CST 2012
Thanks for clearing that up.
> >> Hey everyone,
> >> I have timeseries data in which the column label is simply a filename
> from
> >> which the original data was taken. Here's some sample data:
> >>
> >> name1.txt name2.txt name3.txt
> >> 32 34 953
> >> 32 03 402
> >> I've noticed that the standard genfromtxt() method works great;
> however, the
> >> names aren't written correctly. That is, if I use the command:
> >>
> >> print data['name1.txt']
> >> Nothing happens.
> >>
> >> However, when I remove the file extension, Eg:
> >>
> >> name1 name2 name3
> >> 32 34 953
> >> 32 03 402
> >> Then print data['name1'] return (32, 32) as expected. It seems that the
> >> period in the name isn't compatible with the genfromtxt() names
> attribute.
> >> Is there a workaround, or do I need to restructure my program to get the
> >> extension removed? I'd rather not do this if possible for reasons that
> >> aren't important for the discussion at hand.
> > It looks like the period is just getting stripped out of the names:
> >
> > In [1]: import numpy as N
> >
> > In [2]: N.genfromtxt('sample.txt', names=True)
> > Out[2]:
> > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
> > dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt',
> '<f8')])
> > Interestingly, this still happens if you supply the names manually:
> >
> > In [17]: def reader(filename):
> > ....: infile = open(filename, 'r')
> > ....: names = infile.readline().split()
> > ....: data = N.genfromtxt(infile, names=names)
> > ....: infile.close()
> > ....: return data
> > ....:
> >
> > In [20]: data = reader('sample.txt')
> >
> > In [21]: data
> > Out[21]:
> > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
> > dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt',
> '<f8')])
> > What you can do is reset the names after genfromtxt is through with it,
> though:
> >
> > In [34]: def reader(filename):
> > ....: infile = open(filename, 'r')
> > ....: names = infile.readline().split()
> > ....: infile.close()
> > ....: data = N.genfromtxt(filename, names=True)
> > ....: data.dtype.names = names
> > ....: return data
> > ....:
> >
> > In [35]: data = reader('sample.txt')
> >
> > In [36]: data
> > Out[36]:
> > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
> > dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt',
> '<f8')])
> > Be warned, I don't know why the period is getting stripped; there may
> > be a good reason, and adding it in might cause problems.
> I think the period is stripped because recarrays also offer attribute
> access of names. So you wouldn't be able to do
>
> your_array.sample.txt
>
> All the names get passed through a name validator. IIRC it's something like
>
> from numpy.lib import _iotools
>
> validator = _iotools.NameValidator()
>
> validator.validate('sample1.txt')
> validator.validate('a name with spaces')
>
> NameValidator has a good docstring and the gist of this should be in
> the genfromtxt docs, if it's not already.
>
> Skipper
