[Numpy-discussion] fixing up datetime

Wes McKinney wesmckinn@gmail....
Wed Jun 8 04:57:07 CDT 2011


On Wed, Jun 8, 2011 at 7:36 AM, Chris Barker <Chris.Barker@noaa.gov> wrote:
> On 6/7/11 4:53 PM, Pierre GM wrote:
>>  Anyhow, each time yo
>> read 'frequency' in scikits.timeseries, think 'unit'.
>
> or maybe "precision" -- when I think if unit, I think of something that
> can be represented as a floating point value -- but here, with integers,
> it's the precision that can be represented. Just a thought.
>
>> Well, it can be argued that the epoch is 0...
>
> yes, but that really should be transparent to the user -- what epoch is
> chosen should influence as little as possible (e.g. only the range of
> values representable)
>
>> Mmh. How would you define a quarter unit ? [3M] ? But then, what if
>> you want your year to start in December, say (we often use
>> DJF/MAM/JJA/SON as a way to decompose a year in four 'hydrological'
>> seasons, for example)
>
> And the federal fiscal year is Oct - Sept, so the first quarter is (Oct,
> Nov, Dec) -- clearly that needs to be flexible.
>
>
> -Chris
>
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker@noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

Your guys' discussion is a bit overwhelming for me in my currently
jet-lagged state ( =) ) but I thought I would comment on a couple
things, especially now with the input of another financial Python user
(great!).

Note that I use scikits.timeseries very little for a few reasons (a
bit OT, but...):

- Fundamental need to be able to work with multiple time series,
especially performing operations involving cross-sectional data
- I think it's a bit hard for lay people to use (read: ex-MATLAB/R
users). This is just my opinion, but a few years ago I thought about
using it and concluded that teaching people how to properly use it (a
precision tool, indeed!) was going to cause me grief.
- The data alignment problem, best explained in code:

In [8]: ts
Out[8]:
2000-01-05 00:00:00    0.0503706684002
2000-01-12 00:00:00    -1.7660004939
2000-01-19 00:00:00    1.11716758554
2000-01-26 00:00:00    -0.171029995265
2000-02-02 00:00:00    -0.99876580126
2000-02-09 00:00:00    -0.262729046405

In [9]: ts.index
Out[9]:
<class 'pandas.core.daterange.DateRange'>
offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None
[2000-01-05 00:00:00, ..., 2000-02-09 00:00:00]
length: 6

In [10]: ts2 = ts[:4]

In [11]: ts2.index
Out[11]:
<class 'pandas.core.daterange.DateRange'>
offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None
[2000-01-05 00:00:00, ..., 2000-01-26 00:00:00]
length: 4

In [12]: ts + ts2
Out[12]:
2000-01-05 00:00:00    0.1007413368
2000-01-12 00:00:00    -3.5320009878
2000-01-19 00:00:00    2.23433517109
2000-01-26 00:00:00    -0.34205999053
2000-02-02 00:00:00    NaN
2000-02-09 00:00:00    NaN

Or ts / or ts2 could be completely DateRange-naive (e.g. they have no
way of knowing that they are fixed-frequency), or even out of order,
and stuff like this will work no problem. I view the "fixed frequency"
issue as sort of an afterthought-- if you need it, it's there for you
(the DateRange class is a valid Index--"label vector"--for pandas
objects, and provides an API for defining custom time deltas). Which
leads me to:

- Inability to derive custom offsets:

I can do:

In [14]: ts.shift(2, offset=2 * datetools.BDay())
Out[14]:
2000-01-11 00:00:00    0.0503706684002
2000-01-18 00:00:00    -1.7660004939
2000-01-25 00:00:00    1.11716758554
2000-02-01 00:00:00    -0.171029995265
2000-02-08 00:00:00    -0.99876580126
2000-02-15 00:00:00    -0.262729046405

or even generate, say, 5-minutely or 10-minutely date ranges thusly:

In [16]: DateRange('6/8/2011 5:00', '6/8/2011 12:00',
offset=datetools.Minute(5))
Out[16]:
<class 'pandas.core.daterange.DateRange'>
offset: <5 Minutes>, tzinfo: None
[2011-06-08 05:00:00, ..., 2011-06-08 12:00:00]
length: 85

I'm currently working on high perf reduceat-based resampling methods
(e.g. converting secondly data to 5-minutely data).

So in summary, w.r.t. time series data and datetime, the only things I
care about from a datetime / pandas point of view:

- Ability to easily define custom timedeltas
- Generate datetime objects, or some equivalent, which can be used to
back pandas data structures
- (possible now??) Ability to have a set of frequency-naive dates
(possibly not in order).

This last point actually matters. Suppose you wanted to get the worst
5-performing days in the S&P 500 index:

In [7]: spx.index
Out[7]:
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[1999-12-31 00:00:00, ..., 2011-05-10 00:00:00]
length: 2963

# but this is OK
In [8]: spx.order()[:5]
Out[8]:
2008-10-15 00:00:00    -0.0903497960942
2008-12-01 00:00:00    -0.0892952780505
2008-09-29 00:00:00    -0.0878970494885
2008-10-09 00:00:00    -0.0761670761671
2008-11-20 00:00:00    -0.0671229140321

- W


More information about the NumPy-Discussion mailing list