[Numpy-discussion] fixing up datetime

alan@ajackso... alan@ajackso...
Mon Jun 13 11:59:30 CDT 2011

I'm joining this late (I've been traveling), but it might be useful to
look at the fairly new R module "lubridate". They have put quite some
thought into simplifying date handling, and when I have used it I have
generally been quite pleased. The documentation is quite readable.

Just Google it and it will be at the top.

>Hey all,
>So I'm doing a summer internship at Enthought, and the first thing they
>asked me to look into is finishing the datetime type in numpy. It turns out
>that the estimates of how complete the type was weren't accurate, and to
>support what the NEP describes required generalizing the ufunc type
>resolution system. I also found that the date/time parsing code (based on
>mxDateTime) was not robust, producing something for almost any arbitrary
>garbage input. I've replaced much of the broken code and implemented a lot
>of the functionality, and thought this might be a good point to do a pull
>request on what I've got and get feedback on the issues I've run into.
>* The existing datetime-related API is probably not useful, and in fact
>those functions aren't used internally anymore. Is it reasonable to remove
>the functions, or do we just deprecate them?
>* Leap seconds probably deserve a rigorous treatment, but having an internal
>representation with leap-seconds overcomplicates otherwise very simple and
>fast operations. Could we internally use a value matching TAI or GPS time?
>Currently it's a UTC time in the present, but the 1970 epoch is then not the
>UTC 1970 epoch, but 10s of seconds off, and this isn't properly specified.
>What are people's opinions? The Python datetime.datetime doesn't support
>leap seconds (seconds == 60 is disallowed).
>* Default conversion to string - should it be in UTC or with the local
>timezone baked in? As UTC it may be confusing because 'now' will print as a
>different time than people would expect.
>* Business days - The existing business idea doesn't seem very useful,
>representing just the western M-F work week and not accounting for holidays.
>I've come up with a design which might address these issues: Extend the
>metadata for business days with a string identifier, like 'M8[B:USA]', then
>have a global internal dictionary which maps 'USA' to a workweek mask and a
>list of holidays. The call to prepare this dictionary for a particular
>business day type might look like np.set_business_days('USA', [1, 1, 1, 1,
>1, 0, 0], np.array([ list of US holidays ], dtype='M8[D]')). Internally,
>business days would be stored the same as regular days, but with special
>treatment where landing on a weekend or holiday gives you back a NaT.
>If you are interested in the business day functionality, please comment on
>this design!
>* The dtype constructor accepted 'O#' for object types, something I think
>was wrong. I've removed that, but allow # to be 4 or 8, producing a
>deprecation warning if it occurs.
>* Would it make sense to offset the week-based datetime's epoch so it aligns
>with ISO 8601's week format? Jan 1, 1970 is a thursday, but the YYYY-Www
>date format uses weeks starting on monday. I think producing strings in this
>format when the datetime has units of weeks would be a natural thing to do.
>* Should the NaT (not-a-time) value behave like floating-point NaN? i.e. NaT
>== NaT return false, etc. Should operations generating NaT trigger an
>'invalid' floating point exception in ufuncs?

