[Numpy-discussion] RFC: A (second) proposal for implementing some date/time types in NumPy
Mon Jul 28 12:16:25 CDT 2008
A Saturday 26 July 2008, Matt Knox escrigué:
> >> For this goal, we are proposing a decoupling of the date/time use
> >> cases in two different groups:
> >> 1. A pure ``datetime`` dtype (absolute or relative) that would be
> >> useful for timestamping purposes in general (i.e. registering
> >> dates without a need that they be evenly spaced in time).
> I agree with this split. A basic datetime data type would be useful
> to a lot of people that don't need fancier time series capabilities.
Excellent, this is our thought too.
> I would recommend focusing on implementing this first as it will
> probably provide lots of useful learning experiences and examples for
> the more complicated task of a "frequency" aware date type later on.
Definitely. We plan to do exactly this.
> >> 2. A class based on the ``frequency`` concept that would be useful
> >> for measurements that are done on a regular basis or in business
> >> applications.
> >> ...
> >> Our ultimate goal is that the ``Date`` and ``DateArray`` classes
> >> in the TimeSeries would be rewritten in terms of the new date/time
> >> dtype so as to get advantage of its features but also for getting
> >> rid of duplicated code.
> I'm excited to hear such interest in time series work with python and
> numpy. I certainly support the goals and more collaboration and
> sharing of code is always a good thing. My biggest concern would be
> not losing existing functionality. A decent amount of work went into
> implementing all the different frequencies, and losing any of the
> currently supported frequencies could mean the difference between the
> dtype being very useful to someone, or not useful at all.
> Just thinking out loud here... but in terms of improving on the Date
> implementation in the timeseries module, it would be nice to have a
> more "plug in" kind of architecture for implementing different
> frequencies so that it could be extended more easily with custom
> frequencies by other users. There is no end to the list of possible
> frequencies that people might potentially use and the current
> timeseries implementation isn't as flexibile as it could be in that
We completely agree with the idea of the plug-in architecture for the
``Date`` class. Are you thinking in something concrete already?
> The automatic string parsing has been mentioned before, but it is a
> feature I am personally very fond of. I use it all the time, and I
> suspect a lot of people would like it very much if they used it. It's
> not suited for high performance code, but is fantastic for
> interactive and ad-hoc work. This is supported right in the
> "constructor" of the current Date class, along with conversion from
> datetime objects. I'd love to see such support built into the new
> date type, although I guess it could be added on easily enough with a
> factory function.
Well, what we are planning is to support only three kinds of
- From ``datetime.datetime`` (absolute time) or ``datetime.timedelta``
(relative time) objects.
- From integers or floating points numbers (relative time).
- From ISO-8601 strings (absolute time).
The last input mode does imply a parser, but our intention is to support
directly just the standard ISO. We think that if you want to specifiy
other string formats it is better to rely on the ``datetime`` parsers
or, as John Hunter suggests, the ``dateutil`` module. We believe that
incorporating more parsers into the ``Date`` class may represent an
unnecessary duplication of code.
> Another extra feature (or hack depending on your point of view) in
> the timeseries Date class is the addition of a couple extra custom
> directives for string formatting. Specifically the %q and %Q
> directives for printing out Quarter information. Obviously these are
> non-standard directives, but when you are talking about dates with
> custom frequencies I think it sometimes make sense to have custom
> format directives. A plug in architecture that somehow lets you
> define new custom directives for various frequencies would also be
> really nice.
Maybe you are right, yes. However, I'd consider using the ``datetime``
or ``dateutil`` for this first. If there are use cases that escape to
existing modules, then we can start thinking about this, but not
> Anyway, I'm very much in support of this initiative. I'm not sure
> I'll be able to help much on the initial implementation, but once you
> have a framework in place I may be able to pitch in with some of the
> details. Please keep us posted.
Yes, that's the idea. We plan to send a third proposal (tomorrow?)
based on the latests suggestions by Pierre. Once we reach a consensus,
we will start the implementation of the date/time dtype based on the
final proposal (hopefully, the third one). It would be great if, based
on this, and before or during the implementation phase of the dtype,
you can start thinking about the architecture of the new ``Date`` class
(with all the added fanciness that you are proposing) so that we can
have time to include possible details that escaped from the final
proposal for the date/time dtype.
Thanks a lot!
More information about the Numpy-discussion