[Numpy-discussion] RFC: A proposal for implementing some date/time types in NumPy
Mon Jul 14 14:12:18 CDT 2008
A Monday 14 July 2008, Christopher Barker escrigué:
> Matt Knox wrote:
> > The DateArray class in the timeseries scikits can do part of what
> > you want. Observe...
> >>>> a.year
> > array([2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
> > 2008, 2008, 2008, 2008, 2008])
> >>>> a.hour
> > array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0, 1])
> >>>> a.day
> > array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13])
> This is great for what I often need: to output data in a format with
> columns of:
> year, month, day, hour, min, sec
I see. However, the more I think about this, the more I see the need to
split the date/time functionalities and duties in two parts:
* the first one implementing a date/time dtype with the basic
functionality for timestamping and/or time-interval measuring.
* the second part would be a specific array container of date/time types
(which maybe perfectly a porting of the DateArray of the
scikits.timeseries that would be based on the date/time type) where one
can implement all of the functionality (like the one that you are
proposing above) that escapes to a humble date/time dtype.
Definitely, having this two-layer approach is going to allow a more
powerful and flexible approach in the long term, IMO.
> But I also often need to be able to convert a "TimeDelta" to a
> particular unit, for example (using the lib datetime):
> >>> td = datetime.datetime(2008, 7, 14, 12) -
> >>> datetime.datetime(2008,
> 7, 13, 10)
> >>> td
> datetime.timedelta(1, 7200)
> so we have a timedelta of one day, 7200 seconds.
> I'd like:
> >>> td.as_hours
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AttributeError: 'datetime.timedelta' object has no attribute
> which doesn't exist in the datetime module, so I do:
> >>> hours = td.days*24 + td.seconds/3600.
> >>> hours
> I find myself writing this code al ot, so I'd love to have it built
Hmm, I don't know if having a conversor for every time unit would be too
much. I'd prefer the next:
where you can specify the time unit as the parameter. Will take note of
> Which brings up an issue:
> The reason it isn't built in is that the philosophy behind the
> datetime module is that it provides the building blocks with which to
> build more feature-full packages. Personally, I really wish it had a
> bit more built in, but what can we do?
> As for the numpy datetime types, we need to decide how much to build
> in. I think the kind of functionality described here is pretty basic,
> and should be included, but if we inculde everyone's idea of basic,
> it could get pretty bloated!
Completely agree. This is why I'm proposing the two-layer approach:
have the basic date/time functionality implemented as a dtype (i.e. in
C space), and put the other niceties into a sort of ``DateArray``
(perhaps in Python space).
> > the idea we
> > are incubating is to complement the ``datetime64`` with a
> > 'resolution' metainfo. The ``datetime64`` will still be based on a
> > int64 type, but the meaning of the 'ticks' would depend on a
> > 'resolution' property.
> I like this! Would there be conversion between different resolutions
> available? I wonder what that syntax for that should be?
Well, what about the ".as_timeunit()" stated above for the date/time
scalar and another similar for the ``DateArray`` layer?. However, be
aware that, as we are proposing integer arithmetic for the date/time
types (and not fixed-point of floating-point arithmetic) you *will*
loose precision when changing resolution from a fine-grained time unit
to another more coarse-grained (and inversely, you may risk to overflow
when changing resolution from a coarse-grained to another more
fine-grained unit), and this may not be what you want.
> > And
> > definitely, "offset" would be similar to "origin". So yes, we will
> > try to introduce both concepts.
> yup -- origin is critical!
> What resolution (and numerical format) do you use to express the
> origin? Even if you data is ini days, you may want to specify the
> origin with more precision, so as not to have confusion about what "0
> days" means in some higher resolution unit. Also, if you want
> picosecond resolution, then the origin needs to be picosecond
> resolution as well.
Good point. I'm afraid that we will only support the specification of
the origin with a fixed resolution of microseconds, and between the
year 1 and 9999 (mainly for ``datetime`` compatibility, but also to
avoid the 'egg and the chicken' effect that you noticed ;-).
More information about the Numpy-discussion