[SciPy-dev] time series implementation approach
pgmdevlist at gmail.com
Tue Dec 12 17:40:24 CST 2006
On Tuesday 12 December 2006 18:20, Matt Knox wrote:
> > How does either implementation handles unevenly spaced points (both
> > within and between series) and when the series are in different units
> > (such as days versus weeks)?
> > I read from Pierre's email that 'the series are regularly spaced' but
> > I do think you need to address it sooner than later because it may
> > show major flaws in one implementation. With uneven spacing, you
> > probably need a sparse structure to avoid wasting resources.
> > With different units, one of the series must be converted either by
> > the user or by code (which could be rather complex to get correct).
The way we're going, a series is regularly spaced with a given frequency.
Combination of series (for example, addition) is valid only if the series
have the same frequencies. If not, some conversion should take place (we're
discussing on the best way to do it, that's tricky indeed), but it'll be up
to the user to decide which series should be converted.
About gaps in the data. Well, that's yet another point of discussion.
I think that the series should be made regular, with the smaller reasonable
frequency. Gaps are then masked.
For example, I have to work with daily values from a station that didn't
record anything for some periods up to several years at a time. In order to
do evn the simplest thing as plotting the data, I had to convert the data to
a daily timestep, and masked the dates for which no data was recorded. Yes,
that's a waste of resources, but it's far easier than trying to figure when I
have to jump from one timestep to another. And it gives the possibility to
work directly with moving averages, plots, and so forth. And I can always
select a period on which I don't have missing data to play with methods that
can't handle them.
[thinking aloud: yeah, we could keep handle uneven series by storing the
timestep (expressed as date.relativedelta, for example) along with the data,
just like a mask is stored along the data in a masked array. But then, to
combine 2 series, you would have to check for the compatibility of their
timesteps as well. That's getting messy. Nah, the easiest is really to stick
with regular freqs, and provide functions to fill the gaps with ma.masked]
> > Also, what you really mean by 'blah = series1 + series2'?
> > Do you mean concatenation as with strings, or summation as with
> > numbers, or some sort of merging of values?
> I mean element-wise addition.
> behind the scenes what happens is that series1 and series2 are resized so
> that their indices match up, and then the underlying masked arrays are just
> added together as normal. You can take a look at my example script in the
> scipy sandbox if you want a clearer idea of how the current design works.
You end up with a series that starts in 01/01/2005 and ends in 01/04/2005, but
where the first and last data are masked.
> > Many of the time series methods can be applied to other series than
> > just time. Are you going to allow other types of series?
> no reason not to, but I'm mainly concerned with figuring out a good
> structure to the TimeSeries and Date classes right now that will provide a
> good foundation to build on.
I agree with Matt. Here, we're interested in objects that keep temporal
information along some regular information (be it amount of rain, or share
value). What I think you mean by timeseries method (autocorrelation
coefficients, for example) is yet another problem. And as nothing to do with
dates per se (more with masked arrays...)
More information about the Scipy-dev