[SciPy-user] [timeseries] Missing dates
Pierre GM
pgmdevlist@gmail....
Sun Apr 5 12:46:02 CDT 2009
On Apr 5, 2009, at 1:05 PM, Christiaan Putter wrote:
> Hi there,
> <...>
>
> So how would I go about doing fill missing dates on a timeseries
> with records?
Well, you found a bug. It won't be corrected before the first release
(in the next 48h), but will in the few days.
Meanwhile, here's a workaround:
>>> newlength = series.dates[-1] - series.dates[0] + 1
>>> newseries = ts.time_series(np.empty(newlength,
dtype=series.dtype), start_date=series.dates[0])
>>> newlength = series.dates[-1] - series.dates[0] + 1
>>> newseries = ts.time_series(np.empty(newlength, dtype=series.dtype),
start_date=series.dates[0])
>>> for n in series.dtype.names:
newseries[n] = series[n].fill_missing_dates()
>
>
>
> Besides that I managed to use timeseries as a data source in chaco
> plots. Basic functionality is there, though I'm sure some guys at
> enthought will be able to improve on it. For their tick axis to work
> with dates properly I had to convert series.dates into posix
> timestamps:
>
> index = [time.mktime(t.timetuple()) for t in series.dates.tolist()]
> time_ds = ArrayDataSource(index)
>
> Not very efficient though I guess.
>
> In chaco a plot's index and values are in seperate data sources, which
> means timeseries with different frequencies require different indexes.
> I'll talk with one of their devs to see what's the best approach to
> deal with that.
>
> I'm assuming you merge the index axis somehow in matplotlib when
> plotting series with different frequencies?
When you plot a timeseries, the frequency of the plot is the frequency
of the series.
If you use multiple series w/ different frequencies, they're all
converted to the plot frequency. If this latter was still undefined,
it is set to the frequency of the first series, and the other ones are
converted to this frequency.
In other terms:
* Either you convert all the series to a same frequency beforehand
(using .asfreq, not .convert)
* Either you plot the series with the highest frequency first, then
the others
> So if I were to plot
> something with a business frequency for the entire year 2009, and on
> the same figure plot a series with daily frequency just in January I'm
> guessing weekends will show up as gaps in the first plot?
Nope: because you plotted a business frequency first, there won't be
any gap on week-ends. Instead, your second series will be internally
expressed in business days, so you'll have several points falling on
the same date (week-ends on fridays).
More information about the SciPy-user
mailing list