[SciPy-User] [ANN] pandas 0.1, a new NumPy-based data analysis library
Matt Knox
mattknox.ca@gmail....
Wed Dec 30 17:03:07 CST 2009
Wes McKinney <wesmckinn <at> gmail.com> writes:
> I don't think you were asking this, but I have gotten this question
> from others. We should probably have a broader discussion about
> handling time series data particularly given the recent datetime dtype
> addition to NumPy.
Agreed. I think once the numpy datetime dtype matures a bit, it would be
worthwhile to have a "meeting of the minds" on the future of time series
data in python in general. In the mean time, I think it is very healthy to have
some different approaches out in the wild (scikits.timeseries, pandas, nipy
timeseries) to allow people to flesh out ideas, see what works, what doesn't,
where there is overlap, etc. Hopefully we can then unite the efforts and not
end up with a confusing landscape of multiple time series packages like R has.
However, I think any specific interoperability work between the packages is a
bit premature at this point until the final vision is a bit clearer.
> for adding two scikits.timeseries.TimeSeries
> "
> When the second input is another TimeSeries object, the two series
> must satisfy the following conditions:
>
> * they must have the same frequency;
> * they must be sorted in chronological order;
> * they must have matching dates;
> * they must have the same shape.
> "
>
> pandas does not know or care about the frequency, shape, or sortedness
> of the two TimeSeries. If the above conditions are met, it will bypass
> the "matching logic" and go at NumPy vectorized binary op speed. But
> if you break one of the above conditions, it will still match dates
> and produce a TimeSeries result.
Believe it or not, what you just described is along the lines of how the
original scikits.timeseries prototype behaved. It drew inspiration from the
"FAME 4GL" time series language. FAME does all of the frequency / shape
matching implicitly. It was decided (by the two person comittee of Pierre and
I) that this behaviour felt a little to alien relative to the standard numpy
array objects so we went back to the drawing board and used a more conservative
approach. That is to say, frequency conversion and alignment must be done
explicitly in the scikits.timeseries module. In practice, I don't find this to
be a burden and like the extra clarity in the code, but it really depends what
kind of problems you are solving, and certainly personal preference and
experience plays a big role.
At any rate, looking forward to seeing how the pandas module evolves and
hopefully we can collaborate at some point in the future.
- Matt
More information about the SciPy-User
mailing list