[Numpy-discussion] RFC: A (second) proposal for implementing some date/time types in NumPy

Francesc Alted faltet@pytables....
Mon Jul 28 11:17:41 CDT 2008


Hi Pierre,

A Friday 25 July 2008, Pierre GM escrigué:
> Francesc,
>
> Could you clarify a couple of points ?
>
> [datetime64]
> If I understand properly, your datetime64 would be time units from
> the POSIX epoch (1970/01/01 00:00:00), right ? So
>
> +7d would be 1970/01/08 (7 days after the epoch)
> -7W would be 1969/11/13 (7*7 days before the epoch)
>
> With this approach, a series [1,2,3,7] at a resolution 'd' would
> correspond to 1970/01/01, 1970/01/02, 1970/01/03 and 1970/01/07,
> right ?
>
> I'm all for that, **AS LONG AS we have a business day resolution**
> 'b', so that
> +7b would be 1970/01/09.

We have been analyzing the addition of a business day resolution into 
the bag, but this has the problem that such an entity cannot be 
considered as a 'resolution' as such.  The problem is that the business 
day does not have a fixed time-span (2 days of the week doesn't count, 
and that introduces a non-regular behaviour in many situations).

Having said that, it is apparent that the bussiness day is a **strong 
requeriment** on your side, and you know that we would like to keep you 
happy.  So, for allowing this to happen, we have concluded that a 
conceptual change in our second proposal is needed: instead of 
a 'resolution', we can introduce the 'time unit' concept.  A 'time 
unit' can be considered as an extent of time that doesn't necessarily 
need to be fixed, but can change depending on the context of use.  As 
the 'time unit' concept has this less restrictive meaning, we think 
that the user can be easily persuaded that a 'business day' can enter 
into this definition (which can be difficult/weird to explain in case 
of using the 'resolution' concept).

We have given this some thought, and while it is certain that this will 
suppose a bit more of complexity (not too much, really).  So, yes, we 
are willing to rewrite the proposal with the new 'time unit' concept 
and include the 'business day' too.  With this, we hope to better serve 
the needs of the TimeSeries authors and users.

Also, adding the 'time unit' concept (and its corresponding 
infraestructure) into the dtype opens the door to the adoption of 
other 'XXXX units' inside NumPy so that, for example, people can easily 
convert from, say, miles and kilometers easily this:

lengths_in_miles_array.astype('length[Km]')

but well, this is another history.

> [timedelta64]
> I like your idea of a timedelta64 being relative, but in that case,
> why not having the same resolutions as datetime64 ?

At the beginning our argument to stay with weeks as the minimum 
resolution for relative times was that the duration of months and years 
was not well defined (a month can last between 28 and 31 days, and a 
year 365 or 366 days) for a time that was meant to be *relative* (for 
example, the duration of a relative month is different if the reference 
time is June or July).

However, after thinking more about this, we think now that a relative 
time of months or years has a clear meaning indeed: it makes a lot of 
sense to say "3 months after July 1998" or "5 months before August 
2008", i.e. they make complete sense when it is used in combination 
with an absolute date.  One thing that will not be possible though, is 
to change the time unit of a relative time expressed in say, years, to 
another time unit expressed in say, days.  This is because the 
impossibility to know how many days has a year that is relative (i.e. 
not bound to a given year).  More in general, it will not be possible 
to perform 'time unit' conversions between units above and below a 
relative week (because it is the maximum time unit that has a definite 
number of seconds).

So, yes, will be adding months and years to the relative times too.

> [scikits.timeseries]
> We can currently perform the following operations in
> scikits.timeseries
>
> >>>import scikits.timeseries as ts
> >>>series = ts.date_array(['1970-01', '1970-02', '1970-09'],
> >>> freq='M') series
>
> DateArray([Jan-1970, Feb-1970, Sep-1970],
>           freq='M')
>
> >>>series.asfreq('A')
>
> DateArray([1970, 1970, 1970],
>           freq='A-DEC')
>
> >>>series.asfreq('A-MAR')
>
> DateArray([1970, 1970, 1971],
>           freq='A-MAR')
> "A-MAR" means that year YY ends on 03/31 and that year (YY+1) starts
> on 04/01.
>
> I use that a lot in my work, when I need to average daily data by
> water years (a water year starts usually on 04/01 and ends the
> following 03/31).
>
> How would I do that with datetime64 and timedelta64 ?

Well, as we don't like an 'origin' to have part of our proposal, you 
won't be able to do exactly that with the proposed plain dtype.  
However, we think that by making a rational use of smaller time units 
(i.e. with more resolution, using the old convention) and a combination 
of absolute and relative times, it is easy to cover this use case.  To 
continue with your example, you will be able to do:

>>> series = numpy.array(['1970-01', '1970-02', '1970-09'], 
dtype='T[M]')
>>> series.astype('Y')
array([1970, 1970, 1970], dtype='T8[Y]')

>>> series2 = series + 3   # Add 3 relative months
>>> series2.astype('Y')
array([1970, 1970, 1971], dtype='T8[Y]')

I hope you get the idea.

> Apart from that, I'd be of course quite happy to help as much as I
> can. P.

Well, I really hope that you would be ok with the modifications that we 
are planning to do for the new (third) proposal.

Many thanks!
Francesc

>
>
> ############################################
>
> On Friday 25 July 2008 07:09:33 Francesc Alted wrote:
> > Hi,
> >
> > Well, as there were no replies to our second proposal for the
> > date/time dtype, I assume that everbody agrees with it ;-)  At any
> > rate, we would like to proceed with the implementation phase very
> > soon now.
> >
> > However, it happens that Enthought is sponsoring this job and they
> > clearly stated that the implementation should cover the needs of as
> > much users as possible.  So, most in particular, we would like that
> > one of the most heavier users of date/time objects, i.e. the
> > TimeSeries authors, would be comfortable with the new date/time
> > dtypes, and specially that they can benefit from them.
> >
> > For this goal, we are proposing a decoupling of the date/time use
> > cases in two different groups:
> >
> > 1. A pure ``datetime`` dtype (absolute or relative) that would be
> > useful for timestamping purposes in general (i.e. registering dates
> > without a need that they be evenly spaced in time).
> >
> > 2. A class based on the ``frequency`` concept that would be useful
> > for measurements that are done on a regular basis or in business
> > applications.
> >
> > With this, we are preventing the dtype implementation at the core
> > of NumPy from being too cluttered with the relatively complex needs
> > of the ``frequency`` concept users, factoring it out to a external
> > class (``Date`` to follow the TimeSeries naming convention).  More
> > importantly, this decoupling will also avoid the mix of those two
> > concepts that, although they are about time measurements, they have
> > quite a different meanings indeed.
> >
> > Another important advantage of this distinction is that the
> > ``datetime`` timestamp requires less meta-information to worry
> > about (basically, the 'resolution' property), while a ``frequency``
> > à la TimeSeries will need more additional meta-information, like
> > the 'start' and 'end' of periods, as well as a more complex way to
> > code frequencies (there exists much more time-periods to be coded,
> > as it can be seen in [1]_). This can be utterly important to allow
> > the NumPy data based on the ``datetime`` dtype to be quickly saved
> > and retrieved on databases like ZODB (object database) or PyTables
> > (HDF5-based database).
> >
> > Our ultimate goal is that the ``Date`` and ``DateArray`` classes in
> > the TimeSeries would be rewritten in terms of the new date/time
> > dtype so as to get advantage of its features but also for getting
> > rid of duplicated code.  I honestly think that this can be a big
> > advantage for TimeSeries indeed (at the cost of taking some time
> > for doing the migration).
> >
> > Does that approach make sense for people?
> >
> > .. [1] http://scipy.org/scipy/scikits/wiki/TimeSeries#Frequencies
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion



-- 
Francesc Alted


More information about the Numpy-discussion mailing list