[Numpy-discussion] RFC: A (second) proposal for implementing some date/time types in NumPy

Francesc Alted faltet@pytables....
Fri Jul 25 06:09:33 CDT 2008


Well, as there were no replies to our second proposal for the date/time 
dtype, I assume that everbody agrees with it ;-)  At any rate, we would 
like to proceed with the implementation phase very soon now.

However, it happens that Enthought is sponsoring this job and they 
clearly stated that the implementation should cover the needs of as 
much users as possible.  So, most in particular, we would like that one 
of the most heavier users of date/time objects, i.e. the TimeSeries 
authors, would be comfortable with the new date/time dtypes, and 
specially that they can benefit from them.

For this goal, we are proposing a decoupling of the date/time use cases 
in two different groups:

1. A pure ``datetime`` dtype (absolute or relative) that would be useful 
for timestamping purposes in general (i.e. registering dates without a 
need that they be evenly spaced in time).

2. A class based on the ``frequency`` concept that would be useful for 
measurements that are done on a regular basis or in business 

With this, we are preventing the dtype implementation at the core of 
NumPy from being too cluttered with the relatively complex needs of the 
``frequency`` concept users, factoring it out to a external class 
(``Date`` to follow the TimeSeries naming convention).  More 
importantly, this decoupling will also avoid the mix of those two 
concepts that, although they are about time measurements, they have 
quite a different meanings indeed.

Another important advantage of this distinction is that the ``datetime`` 
timestamp requires less meta-information to worry about (basically, 
the 'resolution' property), while a ``frequency`` à la TimeSeries will 
need more additional meta-information, like the 'start' and 'end' of 
periods, as well as a more complex way to code frequencies (there 
exists much more time-periods to be coded, as it can be seen in [1]_).  
This can be utterly important to allow the NumPy data based on the 
``datetime`` dtype to be quickly saved and retrieved on databases like 
ZODB (object database) or PyTables (HDF5-based database).

Our ultimate goal is that the ``Date`` and ``DateArray`` classes in the 
TimeSeries would be rewritten in terms of the new date/time dtype so as 
to get advantage of its features but also for getting rid of duplicated 
code.  I honestly think that this can be a big advantage for TimeSeries 
indeed (at the cost of taking some time for doing the migration).

Does that approach make sense for people?

.. [1] http://scipy.org/scipy/scikits/wiki/TimeSeries#Frequencies

Francesc Alted

More information about the Numpy-discussion mailing list