[Numpy-discussion] RFC: A proposal for implementing some date/time types in NumPy

Francesc Alted faltet@pytables....
Mon Jul 14 14:12:18 CDT 2008


A Monday 14 July 2008, Christopher Barker escrigué:
> Matt Knox wrote:
> > The DateArray class in the timeseries scikits can do part of what
> > you want. Observe...
> >
> >>>> a.year
> >
> > array([2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
> > 2008, 2008, 2008, 2008, 2008])
> >
> >>>> a.hour
> >
> > array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,  0,  1])
> >
> >>>> a.day
> >
> > array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13])
>
> This is great for what I often need: to output data in a format with
> columns of:
>
> year, month, day, hour, min, sec

I see.  However, the more I think about this, the more I see the need to 
split the date/time functionalities and duties in two parts:

* the first one implementing a date/time dtype with the basic 
functionality for timestamping and/or time-interval measuring.

* the second part would be a specific array container of date/time types 
(which maybe perfectly a porting of the DateArray of the 
scikits.timeseries that would be based on the date/time type) where one 
can implement all of the functionality (like the one that you are 
proposing above) that escapes to a humble date/time dtype.

Definitely, having this two-layer approach is going to allow a more 
powerful and flexible approach in the long term, IMO.

> But I also often need to be able to convert a "TimeDelta" to a
>
> particular unit, for example (using the lib datetime):
>  >>> td = datetime.datetime(2008, 7, 14, 12) -
>  >>> datetime.datetime(2008,
>
> 7, 13, 10)
>
>  >>> td
>
> datetime.timedelta(1, 7200)
>
> so we have a timedelta of one day, 7200 seconds.
>
> I'd like:
>  >>> td.as_hours
>
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> AttributeError: 'datetime.timedelta' object has no attribute
> 'as_hours'
>
> which doesn't exist in the datetime module, so I do:
>  >>> hours = td.days*24 + td.seconds/3600.
>  >>> hours
>
> 26.0
>
> I find myself writing this code al ot, so I'd love to have it built
> in.

Hmm, I don't know if having a conversor for every time unit would be too 
much.  I'd prefer the next:

td.as_timeunit('hour')

where you can specify the time unit as the parameter.  Will take note of 
this.

>
> Which brings up an issue:
>
> The reason it isn't built in is that the philosophy behind the
> datetime module is that it provides the building blocks with which to
> build more feature-full packages. Personally, I really wish it had a
> bit more built in, but what can we do?
>
> As for the numpy datetime types, we need to decide how much to build
> in. I think the kind of functionality described here is pretty basic,
> and should be included, but if we inculde everyone's idea of basic,
> it could get pretty bloated!

Completely agree.  This is why I'm proposing the two-layer approach:  
have the basic date/time functionality implemented as a dtype (i.e. in 
C space), and put the other niceties into a sort of ``DateArray`` 
(perhaps in Python space).

> >  the idea we
> > are incubating is to complement the ``datetime64`` with a
> > 'resolution' metainfo.  The ``datetime64`` will still be based on a
> > int64 type, but the meaning of the 'ticks' would depend on a
> > 'resolution' property.
>
> I like this! Would there be conversion between different resolutions
> available? I wonder what that syntax for that should be?

Well, what about the ".as_timeunit()" stated above for the date/time 
scalar and another similar for the ``DateArray`` layer?.  However, be 
aware that, as we are proposing integer arithmetic for the date/time 
types (and not fixed-point of floating-point arithmetic) you *will* 
loose precision when changing resolution from a fine-grained time unit 
to another more coarse-grained (and inversely, you may risk to overflow 
when changing resolution from a coarse-grained to another more 
fine-grained unit), and this may not be what you want.

> >  And
> > definitely, "offset" would be similar to "origin".  So yes, we will
> > try to introduce both concepts.
>
> yup -- origin is critical!
>
> What resolution (and numerical format) do you use to express the
> origin? Even if you data is ini days, you may want to specify the
> origin with more precision, so as not to have confusion about what "0
> days" means in some higher resolution unit. Also, if you want
> picosecond resolution, then the origin needs to be picosecond
> resolution as well.

Good point.  I'm afraid that we will only support the specification of 
the origin with a fixed resolution of microseconds, and between the 
year 1 and 9999 (mainly for ``datetime`` compatibility, but also to 
avoid the 'egg and the chicken' effect that you noticed ;-).

Cheers,

-- 
Francesc Alted


More information about the Numpy-discussion mailing list