[SciPy-User] [ANN] pandas 0.1, a new NumPy-based data analysis library

Wes McKinney wesmckinn@gmail....
Wed Dec 30 09:45:56 CST 2009


On Wed, Dec 30, 2009 at 8:36 AM, Tim Michelsen
<timmichelsen@gmx-topmail.de> wrote:
> Hello,
> thanks for the announcement.
>
>> * Date tools: objects for expressing date offsets or generating date
>>  ranges; some functionality similar to scikits.timeseries
> Why do you create data structures similar to scikits.timeseries?
> Couldn't you reuse the functionality from scikits.timeseries?

I think there are two relevant questions here:

  - Why don't I use scikits.timeseries's Date and DateArray objects

In pandas I wanted to stick with working with python datetime objects,
and I needed objects to encapsulate generic date shifts (like "add 5
business days"). The idea was to extend the dateutil.relativedelta
concept to handle business days, last business day of month, etc. Once
you've done that, generating date ranges is a fairly trivial (albeit
not super efficient) next step. The DateRange class is also a valid
Index for a Series or DataFrame object and requires no conversion
(plan to write more about this in the docs when I get a chance)

  - Why don't I use the scikits.timeseries for time series data itself

I don't think you were asking this, but I have gotten this question
from others. We should probably have a broader discussion about
handling time series data particularly given the recent datetime dtype
addition to NumPy. In any case, there are many reasons why I didn't
use it-- the main one is that I wanted to have a unified data model
(i.e. use the same basic class) for both time series and
cross-sectional data. The scikits.timeseries TimeSeries object behaves
too differently. Here's one example:

http://pytseries.sourceforge.net/core.timeseries.operations.html#binary-operations

for adding two scikits.timeseries.TimeSeries
"
When the second input is another TimeSeries object, the two series
must satisfy the following conditions:

        * they must have the same frequency;
        * they must be sorted in chronological order;
        * they must have matching dates;
        * they must have the same shape.
"

pandas does not know or care about the frequency, shape, or sortedness
of the two TimeSeries. If the above conditions are met, it will bypass
the "matching logic" and go at NumPy vectorized binary op speed. But
if you break one of the above conditions, it will still match dates
and produce a TimeSeries result. If you break the conditions above
with scikits.timeseries, you will get a MaskedArray result and lose
all of your date information (correct me if I'm wrong).

pandas's TimeSeries-specific functionality could definitely be much
improved, but I think the easier option for now would be to provide an
interface between the two libraries, as you suggest:

> Could you see chances to design a interface between both packages?
> I have a lot of timeseries code. I would love to reuse that together
> with your package.

Designing a bridge interface between the two packages would probably
be pretty easy and fairly desirable. If you could give me some
examples of what you're doing in your time series code that would be
helpful to know.

> Thanks in advance for calrifications,
> Timmie
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list