[SciPy-User] scikits.timeseries for many, large, independent and irregular time series

Wes McKinney wesmckinn@gmail....
Sun Oct 23 19:40:47 CDT 2011


On Sun, Oct 23, 2011 at 6:28 PM, Neil Hodgson <hodgson.neil@yahoo.co.uk> wrote:
> David,
>
> I've been doing some work with something like this.
>>> 1. I've seen posts discussing converting irregular timeseries to
>>> "proper" regularly spaced TimeSeries data.
> I have been keeping my eye on the excellent looking Pandas and
> scikits.timeseries (which plan to consolidate, see
> http://pandas.sourceforge.net/timeseries.html), but for the reason you
> describe above I've also so far stuck to some home-grown code.  It seems
> like lots of methods would need adapting to cope with non-uniformly sampled
> data (more common in geosciences compared to financial data for example).
>  I've been waiting for Numpy 1.7 and the new datetime64 dtype before
> investing any serious time and energy in to even thinking about it.
>>> 2. Some computations could involve very large TimeSeries objects.
> Here, I am using PyTables, with datetimes stored as float64.  I think it's
> perfect for what you describe.  (Pandas already is also using PyTables as an
> optional io platform).
> Hope that helps and I am interested to see what other people are doing,
> Neil
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>

As the guy behind pandas, I am admittedly biased but can confirm that
it is very good for (in fact, largely designed for) irregularly spaced
time series (and all kinds of labeled / structured data) and in large
part lacks the rigidity of scikits.timeseries. So I would strongly
recommend giving it a try before going down the path of building your
own data structures. I will be continuing to actively develop and
support pandas over the coming years, so having more users providing
feedback on functionality would be very useful for me.

Lately I've built the support infrastructure to enable very fast
operations involving datetime64-indexed data. All of this is
available-- if you have an ordered int64-based index, alignment
operations and merging / joining operations will be extremely fast (I
wrote about this here last month: http://wesmckinney.com/blog/?p=232)

If you need to use float64-based indexes, it should be straightforward
to make a similarly-fast index data structure (more or less a
copy-paste job of the Int64Index, changing the Cython functions to use
their float64 counterparts).

Sometime between now and the end of the year I am going to integrate
datetime64 more thoroughly, essentially eliminating
datetime.datetime-based indexing for most practical use cases. This
should be pretty straightforward to do but will require some care to
ease the transition for legacy systems built based on
datetime.datetime indexing.

PyTables is an excellent storage option, especially if your data is
largely static. pandas provides a dict-like HDFStore class for storing
time series data, which may not be a bad place to start.

best,
Wes


More information about the SciPy-User mailing list