[Numpy-discussion] fixing up datetime
Tue Jun 14 18:19:45 CDT 2011
On Mon, Jun 13, 2011 at 11:59 AM, <firstname.lastname@example.org> wrote:
> I'm joining this late (I've been traveling), but it might be useful to
> look at the fairly new R module "lubridate". They have put quite some
> thought into simplifying date handling, and when I have used it I have
> generally been quite pleased. The documentation is quite readable.
> Just Google it and it will be at the top.
Cool, thanks for the link! I've read through the "made easy" pdf, here are
some points about it:
- It appears to gloss over how they resolve the ambiguities when adding
months and years, or days in local time zones across daylight savings
boundaries. The examples they give are when it's not ambiguous, like adding
a year to January 1st, or adding a second when the daylight savings time
time springs forward. They do mention returning NA in that spring forward
gap, but don't mention what they do in the fall back hour when the same
clock time represents two different moments an hour apart.
- They mention the case timedelta / timedelta, which hadn't been covered in
the NEP, and produces a unitless scalar, that's something to add to NumPy.
- For NumPy, I'm planning to just support UTC time, with some functions to
provide local timezone manipulations which use the OS's timezone setting.
Lubridate appears to have a timezone attached to the date.
- Lubridate distinguishes between "durations" and "periods", where durations
are in fixed time units and periods are relative to the date, like months
and years. The NumPy approach I've taken is for conversions between these
types of units to require unsafe casting and/or fail during type promotion.
- They use accessor functions for the year, month, day, etc. components of
the datetime. I'm thinking this could be done in NumPy by extending the
structured dtype idea with "derived fields", which look like structured
dtype fields, but are computed from the value instead of stored directly.
This idea isn't fully worked out yet, however.
> >Hey all,
> >So I'm doing a summer internship at Enthought, and the first thing they
> >asked me to look into is finishing the datetime type in numpy. It turns
> >that the estimates of how complete the type was weren't accurate, and to
> >support what the NEP describes required generalizing the ufunc type
> >resolution system. I also found that the date/time parsing code (based on
> >mxDateTime) was not robust, producing something for almost any arbitrary
> >garbage input. I've replaced much of the broken code and implemented a lot
> >of the functionality, and thought this might be a good point to do a pull
> >request on what I've got and get feedback on the issues I've run into.
> >* The existing datetime-related API is probably not useful, and in fact
> >those functions aren't used internally anymore. Is it reasonable to remove
> >the functions, or do we just deprecate them?
> >* Leap seconds probably deserve a rigorous treatment, but having an
> >representation with leap-seconds overcomplicates otherwise very simple and
> >fast operations. Could we internally use a value matching TAI or GPS time?
> >Currently it's a UTC time in the present, but the 1970 epoch is then not
> >UTC 1970 epoch, but 10s of seconds off, and this isn't properly specified.
> >What are people's opinions? The Python datetime.datetime doesn't support
> >leap seconds (seconds == 60 is disallowed).
> >* Default conversion to string - should it be in UTC or with the local
> >timezone baked in? As UTC it may be confusing because 'now' will print as
> >different time than people would expect.
> >* Business days - The existing business idea doesn't seem very useful,
> >representing just the western M-F work week and not accounting for
> >I've come up with a design which might address these issues: Extend the
> >metadata for business days with a string identifier, like 'M8[B:USA]',
> >have a global internal dictionary which maps 'USA' to a workweek mask and
> >list of holidays. The call to prepare this dictionary for a particular
> >business day type might look like np.set_business_days('USA', [1, 1, 1, 1,
> >1, 0, 0], np.array([ list of US holidays ], dtype='M8[D]')). Internally,
> >business days would be stored the same as regular days, but with special
> >treatment where landing on a weekend or holiday gives you back a NaT.
> >If you are interested in the business day functionality, please comment on
> >this design!
> >* The dtype constructor accepted 'O#' for object types, something I think
> >was wrong. I've removed that, but allow # to be 4 or 8, producing a
> >deprecation warning if it occurs.
> >* Would it make sense to offset the week-based datetime's epoch so it
> >with ISO 8601's week format? Jan 1, 1970 is a thursday, but the YYYY-Www
> >date format uses weeks starting on monday. I think producing strings in
> >format when the datetime has units of weeks would be a natural thing to
> >* Should the NaT (not-a-time) value behave like floating-point NaN? i.e.
> >== NaT return false, etc. Should operations generating NaT trigger an
> >'invalid' floating point exception in ufuncs?
> | Alan K. Jackson | To see a World in a Grain of Sand |
> | email@example.com | And a Heaven in a Wild Flower, |
> | www.ajackson.org | Hold Infinity in the palm of your hand |
> | Houston, Texas | And Eternity in an hour. - Blake |
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion