[Numpy-discussion] RFC: A (second) proposal for implementing some date/time types in NumPy

Matt Knox mattknox.ca@gmail....
Fri Jul 25 20:22:14 CDT 2008


>> For this goal, we are proposing a decoupling of the date/time use cases 
>> in two different groups:
>>
>> 1. A pure ``datetime`` dtype (absolute or relative) that would be useful 
>> for timestamping purposes in general (i.e. registering dates without a 
>> need that they be evenly spaced in time).

I agree with this split. A basic datetime data type would be useful to a lot
of people that don't need fancier time series capabilities. I would recommend
focusing on implementing this first as it will probably provide lots of useful
learning experiences and examples for the more complicated task of a
"frequency" aware date type later on.

>> 2. A class based on the ``frequency`` concept that would be useful for 
>> measurements that are done on a regular basis or in business 
>> applications.
>> ...
>> Our ultimate goal is that the ``Date`` and ``DateArray`` classes in the 
>> TimeSeries would be rewritten in terms of the new date/time dtype so as 
>> to get advantage of its features but also for getting rid of duplicated 
>> code.

I'm excited to hear such interest in time series work with python and numpy.
I certainly support the goals and more collaboration and sharing of code is
always a good thing. My biggest concern would be not losing existing
functionality. A decent amount of work went into implementing all the
different frequencies, and losing any of the currently supported frequencies
could mean the difference between the dtype being very useful to someone, or
not useful at all.

Just thinking out loud here... but in terms of improving on the Date
implementation in the timeseries module, it would be nice to have a more
"plug in" kind of architecture for implementing different frequencies so
that it could be extended more easily with custom frequencies by other
users. There is no end to the list of possible frequencies that people might
potentially use and the current timeseries implementation isn't as flexibile
as it could be in that area.

The automatic string parsing has been mentioned before, but it is a feature
I am personally very fond of. I use it all the time, and I suspect a lot of
people would like it very much if they used it. It's not suited for high
performance code, but is fantastic for interactive and ad-hoc work. This is
supported right in the "constructor" of the current Date class, along with
conversion from datetime objects. I'd love to see such support built into the
new date type, although I guess it could be added on easily enough with a
factory function.

Another extra feature (or hack depending on your point of view) in the
timeseries Date class is the addition of a couple extra custom directives for
string formatting. Specifically the %q and %Q directives for printing out
Quarter information. Obviously these are non-standard directives, but when you
are talking about dates with custom frequencies I think it sometimes make
sense to have custom format directives. A plug in architecture that somehow
lets you define new custom directives for various frequencies would also be
really nice.

Anyway, I'm very much in support of this initiative. I'm not sure I'll be
able to help much on the initial implementation, but once you have a framework
in place I may be able to pitch in with some of the details. Please keep us
posted.

- Matt




More information about the Numpy-discussion mailing list