[Numpy-discussion] fixing up datetime
Charles R Harris
Wed Jun 1 15:52:08 CDT 2011
On Wed, Jun 1, 2011 at 2:05 PM, Mark Wiebe <firstname.lastname@example.org> wrote:
> Hey all,
> So I'm doing a summer internship at Enthought, and the first thing they
> asked me to look into is finishing the datetime type in numpy. It turns out
> that the estimates of how complete the type was weren't accurate, and to
> support what the NEP describes required generalizing the ufunc type
> resolution system. I also found that the date/time parsing code (based on
> mxDateTime) was not robust, producing something for almost any arbitrary
> garbage input. I've replaced much of the broken code and implemented a lot
> of the functionality, and thought this might be a good point to do a pull
> request on what I've got and get feedback on the issues I've run into.
> * The existing datetime-related API is probably not useful, and in fact
> those functions aren't used internally anymore. Is it reasonable to remove
> the functions, or do we just deprecate them?
> * Leap seconds probably deserve a rigorous treatment, but having an
> internal representation with leap-seconds overcomplicates otherwise very
> simple and fast operations. Could we internally use a value matching TAI or
> GPS time? Currently it's a UTC time in the present, but the 1970 epoch is
> then not the UTC 1970 epoch, but 10s of seconds off, and this isn't properly
> specified. What are people's opinions? The Python datetime.datetime doesn't
> support leap seconds (seconds == 60 is disallowed).
> * Default conversion to string - should it be in UTC or with the local
> timezone baked in? As UTC it may be confusing because 'now' will print as a
> different time than people would expect.
> * Business days - The existing business idea doesn't seem very useful,
> representing just the western M-F work week and not accounting for holidays.
> I've come up with a design which might address these issues: Extend the
> metadata for business days with a string identifier, like 'M8[B:USA]', then
> have a global internal dictionary which maps 'USA' to a workweek mask and a
> list of holidays. The call to prepare this dictionary for a particular
> business day type might look like np.set_business_days('USA', [1, 1, 1, 1,
> 1, 0, 0], np.array([ list of US holidays ], dtype='M8[D]')). Internally,
> business days would be stored the same as regular days, but with special
> treatment where landing on a weekend or holiday gives you back a NaT.
> If you are interested in the business day functionality, please comment on
> this design!
> * The dtype constructor accepted 'O#' for object types, something I think
> was wrong. I've removed that, but allow # to be 4 or 8, producing a
> deprecation warning if it occurs.
> * Would it make sense to offset the week-based datetime's epoch so it
> aligns with ISO 8601's week format? Jan 1, 1970 is a thursday, but the
> YYYY-Www date format uses weeks starting on monday. I think producing
> strings in this format when the datetime has units of weeks would be a
> natural thing to do.
> * Should the NaT (not-a-time) value behave like floating-point NaN? i.e.
> NaT == NaT return false, etc. Should operations generating NaT trigger an
> 'invalid' floating point exception in ufuncs?
Just a quick comment, as this really needs more thought, but time is a bag
of worms. Trying to represent some standard -- say seconds at the solar
system barycenter to account for general relativity -- is something that I
think is too complicated and specialized to put into numpy. Good support for
units and delta times is very useful, but parsing dates and times and
handling timezones, daylight savings, leap seconds, business days, etc., is
probably best served by addon packages specialized to an area of interest.
Just my $.02
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion