[SciPy-user] [timeseries] Missing dates
Fri Apr 3 18:23:21 CDT 2009
On Apr 3, 2009, at 6:46 PM, Christiaan Putter wrote:
> Hi guys,
> I've been playing with timeseries for the last hour or so and it's
> pretty cool. Still lots of things I have to go through.
Write your experience down and keep us posted, that'd be great
material for a FAQ.
> In the one plotting example (using yahoo finance) I saw that one can
> fill missing dates before plotting so that the missing ones get
> masked. Though when applying some moving windows functions that
> caused all periods that were effected by the missing values to also
> become masked, which isn't the behaviour I was expecting. It does
> make sense to do it that way though.
> I'm working with stock
> prices, so the "missing" dates over the weekends will increase file
> size by more then 30%. Is there any other reason to fill in missing
> dates besides for plotting?
Most functions (like convert or align_series) do require consecutive
dates, hence the need for fill_missing_dates. This function ensures
that your dates are all consecutive, and the values corresponding to
the initially missing dates are replaced by the constant masked.
For the case of movering average (for example), if you don't fill the
dates, you may end up grouping values separated by more than your
window size, which may not give the results you'd expect.
For the case of plotting: when plotting a series w/ missing dates
using some line between the points, you'll connect existing points
(eg, Friday w/ Monday). If you want a separation between Fridays and
Mondays, fill the dates first.
OK, one picture is worth a lot of words, so compare the two plots:
>>> import numpy as np, numpy.ma as ma, scikits.timeseries as ts,
scikits.timeseries.lib.plotlib as tpl
>>> s=ts.time_series(np.range(10), dates=ts.date_array(['2001-%02i' %
i for i in (1,2,3,4,5,6,7,10,11,12)],freq='M'))
>>> tpl.tsplot(s, 'o-b')
>>> tpl.tsplot(s.fill_missing_dates(), 's-r')
> The question I'm trying to get at though is if I'm going to store my
> timeseries in hdf5 will I fill in the missing dates before I do so, or
> only do that whenever I plot the timeseries?
If you need to save space, don't save the series w/ missing data. You
can use the `compressed` method to get rid of those before saving, and
use fill_missing_dates after loading.
More information about the SciPy-user