[Numpy-discussion] fixing up datetime

Wes McKinney wesmckinn@gmail....
Wed Jun 8 05:05:49 CDT 2011


On Wed, Jun 8, 2011 at 11:57 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
> On Wed, Jun 8, 2011 at 7:36 AM, Chris Barker <Chris.Barker@noaa.gov> wrote:
>> On 6/7/11 4:53 PM, Pierre GM wrote:
>>>  Anyhow, each time yo
>>> read 'frequency' in scikits.timeseries, think 'unit'.
>>
>> or maybe "precision" -- when I think if unit, I think of something that
>> can be represented as a floating point value -- but here, with integers,
>> it's the precision that can be represented. Just a thought.
>>
>>> Well, it can be argued that the epoch is 0...
>>
>> yes, but that really should be transparent to the user -- what epoch is
>> chosen should influence as little as possible (e.g. only the range of
>> values representable)
>>
>>> Mmh. How would you define a quarter unit ? [3M] ? But then, what if
>>> you want your year to start in December, say (we often use
>>> DJF/MAM/JJA/SON as a way to decompose a year in four 'hydrological'
>>> seasons, for example)
>>
>> And the federal fiscal year is Oct - Sept, so the first quarter is (Oct,
>> Nov, Dec) -- clearly that needs to be flexible.
>>
>>
>> -Chris
>>
>>
>>
>>
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> Chris.Barker@noaa.gov
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> Your guys' discussion is a bit overwhelming for me in my currently
> jet-lagged state ( =) ) but I thought I would comment on a couple
> things, especially now with the input of another financial Python user
> (great!).
>
> Note that I use scikits.timeseries very little for a few reasons (a
> bit OT, but...):
>
> - Fundamental need to be able to work with multiple time series,
> especially performing operations involving cross-sectional data
> - I think it's a bit hard for lay people to use (read: ex-MATLAB/R
> users). This is just my opinion, but a few years ago I thought about
> using it and concluded that teaching people how to properly use it (a
> precision tool, indeed!) was going to cause me grief.
> - The data alignment problem, best explained in code:
>
> In [8]: ts
> Out[8]:
> 2000-01-05 00:00:00    0.0503706684002
> 2000-01-12 00:00:00    -1.7660004939
> 2000-01-19 00:00:00    1.11716758554
> 2000-01-26 00:00:00    -0.171029995265
> 2000-02-02 00:00:00    -0.99876580126
> 2000-02-09 00:00:00    -0.262729046405
>
> In [9]: ts.index
> Out[9]:
> <class 'pandas.core.daterange.DateRange'>
> offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None
> [2000-01-05 00:00:00, ..., 2000-02-09 00:00:00]
> length: 6
>
> In [10]: ts2 = ts[:4]
>
> In [11]: ts2.index
> Out[11]:
> <class 'pandas.core.daterange.DateRange'>
> offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None
> [2000-01-05 00:00:00, ..., 2000-01-26 00:00:00]
> length: 4
>
> In [12]: ts + ts2
> Out[12]:
> 2000-01-05 00:00:00    0.1007413368
> 2000-01-12 00:00:00    -3.5320009878
> 2000-01-19 00:00:00    2.23433517109
> 2000-01-26 00:00:00    -0.34205999053
> 2000-02-02 00:00:00    NaN
> 2000-02-09 00:00:00    NaN
>
> Or ts / or ts2 could be completely DateRange-naive (e.g. they have no
> way of knowing that they are fixed-frequency), or even out of order,
> and stuff like this will work no problem. I view the "fixed frequency"
> issue as sort of an afterthought-- if you need it, it's there for you
> (the DateRange class is a valid Index--"label vector"--for pandas
> objects, and provides an API for defining custom time deltas). Which
> leads me to:
>
> - Inability to derive custom offsets:
>
> I can do:
>
> In [14]: ts.shift(2, offset=2 * datetools.BDay())
> Out[14]:
> 2000-01-11 00:00:00    0.0503706684002
> 2000-01-18 00:00:00    -1.7660004939
> 2000-01-25 00:00:00    1.11716758554
> 2000-02-01 00:00:00    -0.171029995265
> 2000-02-08 00:00:00    -0.99876580126
> 2000-02-15 00:00:00    -0.262729046405
>
> or even generate, say, 5-minutely or 10-minutely date ranges thusly:
>
> In [16]: DateRange('6/8/2011 5:00', '6/8/2011 12:00',
> offset=datetools.Minute(5))
> Out[16]:
> <class 'pandas.core.daterange.DateRange'>
> offset: <5 Minutes>, tzinfo: None
> [2011-06-08 05:00:00, ..., 2011-06-08 12:00:00]
> length: 85
>
> I'm currently working on high perf reduceat-based resampling methods
> (e.g. converting secondly data to 5-minutely data).
>
> So in summary, w.r.t. time series data and datetime, the only things I
> care about from a datetime / pandas point of view:
>
> - Ability to easily define custom timedeltas
> - Generate datetime objects, or some equivalent, which can be used to
> back pandas data structures
> - (possible now??) Ability to have a set of frequency-naive dates
> (possibly not in order).
>
> This last point actually matters. Suppose you wanted to get the worst
> 5-performing days in the S&P 500 index:
>
> In [7]: spx.index
> Out[7]:
> <class 'pandas.core.daterange.DateRange'>
> offset: <1 BusinessDay>, tzinfo: None
> [1999-12-31 00:00:00, ..., 2011-05-10 00:00:00]
> length: 2963
>
> # but this is OK
> In [8]: spx.order()[:5]
> Out[8]:
> 2008-10-15 00:00:00    -0.0903497960942
> 2008-12-01 00:00:00    -0.0892952780505
> 2008-09-29 00:00:00    -0.0878970494885
> 2008-10-09 00:00:00    -0.0761670761671
> 2008-11-20 00:00:00    -0.0671229140321
>
> - W
>

I should add that if datetime64 gets me 80% to solving my needs (which
are rather domain-specific), I will be very happy. Reducing the memory
footprint of long time series (versus having millions of
datetime.datetime objects lying around) will also be a big benefit.


More information about the NumPy-Discussion mailing list