[Numpy-discussion] fixing up datetime
Wed Jun 1 15:05:12 CDT 2011
So I'm doing a summer internship at Enthought, and the first thing they
asked me to look into is finishing the datetime type in numpy. It turns out
that the estimates of how complete the type was weren't accurate, and to
support what the NEP describes required generalizing the ufunc type
resolution system. I also found that the date/time parsing code (based on
mxDateTime) was not robust, producing something for almost any arbitrary
garbage input. I've replaced much of the broken code and implemented a lot
of the functionality, and thought this might be a good point to do a pull
request on what I've got and get feedback on the issues I've run into.
* The existing datetime-related API is probably not useful, and in fact
those functions aren't used internally anymore. Is it reasonable to remove
the functions, or do we just deprecate them?
* Leap seconds probably deserve a rigorous treatment, but having an internal
representation with leap-seconds overcomplicates otherwise very simple and
fast operations. Could we internally use a value matching TAI or GPS time?
Currently it's a UTC time in the present, but the 1970 epoch is then not the
UTC 1970 epoch, but 10s of seconds off, and this isn't properly specified.
What are people's opinions? The Python datetime.datetime doesn't support
leap seconds (seconds == 60 is disallowed).
* Default conversion to string - should it be in UTC or with the local
timezone baked in? As UTC it may be confusing because 'now' will print as a
different time than people would expect.
* Business days - The existing business idea doesn't seem very useful,
representing just the western M-F work week and not accounting for holidays.
I've come up with a design which might address these issues: Extend the
metadata for business days with a string identifier, like 'M8[B:USA]', then
have a global internal dictionary which maps 'USA' to a workweek mask and a
list of holidays. The call to prepare this dictionary for a particular
business day type might look like np.set_business_days('USA', [1, 1, 1, 1,
1, 0, 0], np.array([ list of US holidays ], dtype='M8[D]')). Internally,
business days would be stored the same as regular days, but with special
treatment where landing on a weekend or holiday gives you back a NaT.
If you are interested in the business day functionality, please comment on
* The dtype constructor accepted 'O#' for object types, something I think
was wrong. I've removed that, but allow # to be 4 or 8, producing a
deprecation warning if it occurs.
* Would it make sense to offset the week-based datetime's epoch so it aligns
with ISO 8601's week format? Jan 1, 1970 is a thursday, but the YYYY-Www
date format uses weeks starting on monday. I think producing strings in this
format when the datetime has units of weeks would be a natural thing to do.
* Should the NaT (not-a-time) value behave like floating-point NaN? i.e. NaT
== NaT return false, etc. Should operations generating NaT trigger an
'invalid' floating point exception in ufuncs?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion