[Numpy-discussion] RFC: A (second) proposal for implementing some date/time types in NumPy
Francesc Alted
faltet@pytables....
Fri Jul 25 06:09:33 CDT 2008
Hi,
Well, as there were no replies to our second proposal for the date/time
dtype, I assume that everbody agrees with it ;-) At any rate, we would
like to proceed with the implementation phase very soon now.
However, it happens that Enthought is sponsoring this job and they
clearly stated that the implementation should cover the needs of as
much users as possible. So, most in particular, we would like that one
of the most heavier users of date/time objects, i.e. the TimeSeries
authors, would be comfortable with the new date/time dtypes, and
specially that they can benefit from them.
For this goal, we are proposing a decoupling of the date/time use cases
in two different groups:
1. A pure ``datetime`` dtype (absolute or relative) that would be useful
for timestamping purposes in general (i.e. registering dates without a
need that they be evenly spaced in time).
2. A class based on the ``frequency`` concept that would be useful for
measurements that are done on a regular basis or in business
applications.
With this, we are preventing the dtype implementation at the core of
NumPy from being too cluttered with the relatively complex needs of the
``frequency`` concept users, factoring it out to a external class
(``Date`` to follow the TimeSeries naming convention). More
importantly, this decoupling will also avoid the mix of those two
concepts that, although they are about time measurements, they have
quite a different meanings indeed.
Another important advantage of this distinction is that the ``datetime``
timestamp requires less meta-information to worry about (basically,
the 'resolution' property), while a ``frequency`` à la TimeSeries will
need more additional meta-information, like the 'start' and 'end' of
periods, as well as a more complex way to code frequencies (there
exists much more time-periods to be coded, as it can be seen in [1]_).
This can be utterly important to allow the NumPy data based on the
``datetime`` dtype to be quickly saved and retrieved on databases like
ZODB (object database) or PyTables (HDF5-based database).
Our ultimate goal is that the ``Date`` and ``DateArray`` classes in the
TimeSeries would be rewritten in terms of the new date/time dtype so as
to get advantage of its features but also for getting rid of duplicated
code. I honestly think that this can be a big advantage for TimeSeries
indeed (at the cost of taking some time for doing the migration).
Does that approach make sense for people?
.. [1] http://scipy.org/scipy/scikits/wiki/TimeSeries#Frequencies
--
Francesc Alted
More information about the Numpy-discussion
mailing list