[SciPy-User] Scikits TimeSeries modifies names of features (dtypes

Sergi Pons Freixes spons@utm.csic...
Fri Mar 26 10:38:25 CDT 2010

On Wed, Mar 24, 2010 at 5:27 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
> Actually, this is not a tsfromtxt issue, but a np.genfromtxt one (the main function tsfromtxt is derived from). By giving names in the input, you indicate that you want a structured array (with named fields). Structured arrays can also be viewed as recarrays, where named fields can be accessed as attributes. Obviously, having a space, a comma, a period or any other non-alphanumeric character in the field names will prevent the access by attribute to work. For that reason, np.genfromtxt checks whether the names of the fields are valid, and modify them if they're not (by replacing any non-alphanumeric character by an underscore). Think of it as a feature, not a bug.

Ok. At first, I also thought about the possibility of "they're
removeng strange characters for the sake of simplicity", but I was
shoked when creating the array directly worked:

In [9]: x = np.array([(1,2),(3,4)], dtype=[('Label.1', '<i4'), ('Label
(2)', '<i4')])

In [10]: x
array([(1, 2), (3, 4)],
      dtype=[('Label.1', '<i4'), ('Label (2)', '<i4')])

In [11]: x['Label (2)']
Out[11]: array([2, 4])

So, no cleaning in this case... :S. It's funny that it has been
implemented on np.genfromtxt.

> Now, I can agree that force-feeding this attempt of foolproof behavior to the user is a bit controlling. After all, if you decided to use a name like "Agr.fil. G. corsicum 200x" as field name, you must have a very good reason, and won't mind not being able to use recarrays.

I agree that the field names are not "the best choice", but they are
automatically created from a txt file, and are automatically used in
the code (i.e.,  I never need to type "Agr.fil. G. corsicum 200x"), so
I didn't care much about it. Maybe I implementd a temporal workaround
to clean the names before passing it as labels.

> So, what I can do is leave the automatic validation of names as the default, but introduce an option to bypass it. That'd be OK for everybody ?

It's ok for me.

More information about the SciPy-User mailing list