[SciPy-User] Scikits TimeSeries modifies names of features (dtypes

Pierre GM pgmdevlist@gmail....
Wed Mar 24 11:27:59 CDT 2010

On Mar 24, 2010, at 9:43 AM, Sergi Pons Freixes wrote:
> I have this code, which simply creates a TimeSeries object from a datafile:
> data_labels = ["Temp", "Sal", "O2"] + self.species.keys()
> tmpdata = ts.tsfromtxt(join(dir,f), freq='D', delimiter='\t', \
>              usecols=range(len(data_labels)), datecols=0, \
>              names=data_labels, dateconverter=self.__dateconverter)
> where self.species.keys() is a list of strings, some of them with
> spaces, points, or parenthesis. A pair of examples: "Phaeocystis
> (colonies)", "Agr.fil. G. corsicum 200x".
> But, If I check the resulting tmpdata.dtypes.names (or simply
> tmpdata.dtypes), all the spaces has been changed to underscore, and
> the points and parenthesis removed. Examples: "Phaeocystis_colonies",
> "Agrfil_G_corsicum_200x".
> I've checked the documentation from timeseries and from record arrays
> of numpy, and cannot found the reason... Any hint abou it?

Actually, this is not a tsfromtxt issue, but a np.genfromtxt one (the main function tsfromtxt is derived from). By giving names in the input, you indicate that you want a structured array (with named fields). Structured arrays can also be viewed as recarrays, where named fields can be accessed as attributes. Obviously, having a space, a comma, a period or any other non-alphanumeric character in the field names will prevent the access by attribute to work. For that reason, np.genfromtxt checks whether the names of the fields are valid, and modify them if they're not (by replacing any non-alphanumeric character by an underscore). Think of it as a feature, not a bug.
Now, I can agree that force-feeding this attempt of foolproof behavior to the user is a bit controlling. After all, if you decided to use a name like "Agr.fil. G. corsicum 200x" as field name, you must have a very good reason, and won't mind not being able to use recarrays. So, what I can do is leave the automatic validation of names as the default, but introduce an option to bypass it. That'd be OK for everybody ?

More information about the SciPy-User mailing list