[SciPy-user] roadmap/plans for timeseries package

matt mattknox_ca@hotmail....
Sun Jan 6 22:07:06 CST 2008


> I am also interested because Mr. Millman meantioned something that timeseries
> will be taken out of sandbox to a place called scikit (what's that BTW?)

For the definition of what a scikit is, I suggest reading some of the recent
threads on this topic. I'll refrain from providing you with my own definition as
it is likely to be wrong :)

Definitions aside, yes the timeseries module will probably be moving to a scikit
at some point in the not too distant future. I'd kind of like the maskedarray
module to finish moving into numpy before we make the timeseries module a
scikit, but we may get kicked out of the sandbox before then, so we'll see.

I haven't really spoke to Pierre about the details of this yet, but I suspect
we'll probably start doing actual "releases" once we move to a scikit, and
providing windows binaries.

> * a example data set for at least one year on a high temporal resolution:
> 15min or at least 1h. Having such a common data set one could set up 
> tutorials examples and debug or ask questions easier because all will 
> have the same (non-confidetial) data on the disk.

There was talk about sample data being included in scipy a while ago, not sure
if this ever got anywhere. But I agree that it is worthwhile, especially for a
timeseries module.

> * being able to access the datetime information for calculations:
> as I understand from my oberservation, the datetime information is not 
> really a part of the array and therefore not really available as 
> reference in a calculations. A example:
> One has to get rainfall intensity during early morning hours. For such a 
> filter the information on the corresponding hours are neccessary. Is 
> this already possible?

If I understand  you correctly, then yes, this is already possible. Observe...

>>> import timeseries as ts
>>> data = ts.time_series(range(100), start_date=ts.now('hourly'))
>>> hours = data.hour
>>> filtered_data = data[(hours < 7) & (hours > 3)]
>>> filtered_data
timeseries([ 6  7  8 30 31 32 54 55 56 78 79 80],
           dates = [07-Jan-2008 04:00 07-Jan-2008 05:00 07-Jan-2008 06:00
08-Jan-2008 04:00 08-Jan-2008 05:00 08-Jan-2008 06:00 09-Jan-2008 04:00
09-Jan-2008 05:00 09-Jan-2008 06:00 10-Jan-2008 04:00 10-Jan-2008 05:00
10-Jan-2008 06:00],
           freq  = H)


> * The maskedarray function is not really good documented when it come 
> comes to some infomation behind what's in the docstrings. There are no 
> examples how to mask and data in array based upon different criteria. I 
> find it kinda hard to get the data in a datetime object in a clean manner.

documentation will get better eventually. Realistically, this is probably still
the "early adopter" stage. Not sure what you mean by "find it kinda hard to get
the data in a datetime object in a clean manner". Can you elaborate on that more?

> Well, that's all so far. I haven't gotten into plotting time series more 
> than what's in the wiki. I guess it's mainly knowing some matplotlib.

Yeah, I would recommend spending some time learning matplotlib first before
doing timeseries plotting. Word of caution... the timeseries plotting stuff does
not currently support frequencies higher than daily (eg. hourly, minutely,
etc...). Support for these frequencies could be added without too much trouble,
but just haven't got around to it yet.

> 
> Can I already submit bug/feature requests for time series?
> Where?

Once it becomes a scikit there may be a more systematic way of doing this, but
for now I would recommend just emailing Pierre or myself.

- Matt



More information about the SciPy-user mailing list