[SciPy-User] Scikits TimeSeries modifies names of features (dtypes

Pierre GM pgmdevlist@gmail....
Fri Mar 26 12:12:24 CDT 2010


On Mar 26, 2010, at 11:38 AM, Sergi Pons Freixes wrote:
> On Wed, Mar 24, 2010 at 5:27 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
>> 
>> Actually, this is not a tsfromtxt issue, but a np.genfromtxt one (the main function tsfromtxt is derived from). By giving names in the input, you indicate that you want a structured array (with named fields). Structured arrays can also be viewed as recarrays, where named fields can be accessed as attributes. Obviously, having a space, a comma, a period or any other non-alphanumeric character in the field names will prevent the access by attribute to work. For that reason, np.genfromtxt checks whether the names of the fields are valid, and modify them if they're not (by replacing any non-alphanumeric character by an underscore). Think of it as a feature, not a bug.
> 
> Ok. At first, I also thought about the possibility of "they're
> removeng strange characters for the sake of simplicity", but I was
> shoked when creating the array directly worked:
> 
> In [9]: x = np.array([(1,2),(3,4)], dtype=[('Label.1', '<i4'), ('Label
> (2)', '<i4')])
> 
> In [10]: x
> Out[10]:
> array([(1, 2), (3, 4)],
>      dtype=[('Label.1', '<i4'), ('Label (2)', '<i4')])
> 
> In [11]: x['Label (2)']
> Out[11]: array([2, 4])
> 
> So, no cleaning in this case... :S. It's funny that it has been
> implemented on np.genfromtxt.

Yes, you can define a strucutred array with non-alphanumeric characters in the field names. However, the names will be corrected if you attempt to create a recarray. np.genfromtxt follows the latter convention.


>> Now, I can agree that force-feeding this attempt of foolproof behavior to the user is a bit controlling. After all, if you decided to use a name like "Agr.fil. G. corsicum 200x" as field name, you must have a very good reason, and won't mind not being able to use recarrays.
> 
> I agree that the field names are not "the best choice", but they are
> automatically created from a txt file, and are automatically used in
> the code (i.e.,  I never need to type "Agr.fil. G. corsicum 200x"),

Which is exactly the case where an automatic validation of the fields names comes handy, if you decide to make a recarray out of your structured array...


> 
>> So, what I can do is leave the automatic validation of names as the default, but introduce an option to bypass it. That'd be OK for everybody ?
> 
> It's ok for me.

'K then, I'll work on that this afternoon.



More information about the SciPy-User mailing list