[Numpy-discussion] datetimes with date vs time units, local time, and time zones

Mark Wiebe mwwiebe@gmail....
Thu Jun 16 10:27:07 CDT 2011

On Thu, Jun 16, 2011 at 9:18 AM, Benjamin Root <ben.root@ou.edu> wrote:

> On Wednesday, June 15, 2011, Mark Wiebe <mwwiebe@gmail.com> wrote:
> > Towards a reasonable behavior with regard to local times, I've made the
> default repr for datetimes use the C standard library to print them in a
> local ISO format. Combined with the ISO8601-prescribed behavior of
> interpreting datetime strings with no timezone specifier to be in local
> times, this allows the following cases to behave reasonably:
> >
> >>>> np.datetime64('now')numpy.datetime64('2011-06-15T15:16:51-0500','s')
> >>>> np.datetime64('2011-06-15T18:00')
> > numpy.datetime64('2011-06-15T18:00-0500','m')
> > As noted in another thread, there can be some extremely surprising
> behavior as a consequence:
> >
> >>>> np.array(['now', '2011-06-15'],
> dtype='M')array(['2011-06-15T15:18:26-0500', '2011-06-14T19:00:00-0500'],
> dtype='datetime64[s]')
> > Having the 15th of June print out as 7pm on the 14th of June is probably
> not what one would generally expect, so I've come up with an approach which
> hopefully deals with this in a good way.
> >
> > One firm principal of the datetime in NumPy is that it is always stored
> as a POSIX time (referencing UTC), or a TAI time. There are two categories
> of units that can be used, which I will call date units and time units. The
> date units are 'Y', 'M', 'W', and 'D', while the time units are 'h', 'm',
> 's', ..., 'as'. Time zones are only applied to datetimes stored in time
> units, so there's a qualitative difference between date and time units with
> respect to string conversions and calendar operations.
> >
> > I would like to place an 'unsafe' casting barrier between the date units
> and the time units, so that the above conversion from a date into a datetime
> will raise an error instead of producing a confusing result. This only
> applies to datetimes and not timedeltas, because for timedeltas the day <->
> hour case is fine, it is just the year/month <-> other units which has
> issues, and that is already treated with an 'unsafe' casting barrier.
> >
> > Two new functions will facilitate the conversions between datetimes with
> date units and time units:
> > date_as_datetime(datearray, hour, minute, second, microsecond,
> timezone='local', unit=None, out=None), which converts the provided dates
> into datetimes at the specified time, according to the specified timezone.
> If 'unit' is specified, it controls the output unit, otherwise it is the
> units in 'out' or the amount of precision specified in the function.
> >
> > datetime_as_date(datetimearray, timezone='local', out=None), which
> converts the provided datetimes into dates according to the specified
> timezone.
> > In both functions, timezone can be any of 'UTC', 'TAI', 'local',
> '+/-####', or a datetime.tzinfo object. The latter will allow NumPy
> datetimes to work with the pytz library for flexible time zone support.
> >
> > I would also like to extend the 'today' input string parsing to accept
> strings like 'today 12:30' to allow a convenient way to express different
> local times occurring today, mostly useful for interactive usage.
> >
> > I welcome any comments on this design, particularly if you can find a
> case where this doesn't produce a reasonable behavior.
> > Cheers,Mark
> >
> Is the output for the given usecase above with the mix of 'now' and a
> datetime string without tz info intended to still be correct?

No, that case would fail. The resolution of 'now' is seconds, and the
resolution of a date string is days, so the case would require a conversion
across the date unit/time unit boundary.

> I personally have misgivings about interpreating phrases like "now" and
> "today" at this level.  I think it introduces a can of worms that
> would be difficult to handle.

I like the convenience it gives at the interactive prompt, but maybe a
datetime_from_string function where you can selectively enable/disable
allowing of these special values and local times can provide control over
this. This is similar to the datetime_as_string function which gives more
flexibility than simple conversion to a string.

Consider some arbitrary set of inputs to the array function for
> datetime objects.  If they all contain no tz info, then they are all
> interpreated the same as-is.  However, if even one element has 'now',
> then the inputs are interpreated entirely differently.  This will
> confuse people.

The element 'now' has no effect on the other inputs, except to possibly
promote the unit to a seconds level of precision. All datetimes are in UTC,
and when timezone information is given, that is only used for parsing the
input, it is not preserved.

Just thinking out loud here, What about a case where the inputs are
> such that some do not specify tz and some others specify a mix of
> timezones? Should that be any different from the case given above?

I think this has the same answer, everything gets converted to UTC.

It has been awhile for me, but how different is this from Perl's
> floating tz for its datetime module?  Maybe we could combine its
> approach with your "unsafe" barrier for the ambiguous situations that
> perl's datetime module mentions?

I'd rather not attach timezone information to the numpy datetime, the pytz
library appears to already support this kind of thing, and I see no reason
to duplicate that effort, but rather support the pytz timezone objects in
certain datetime manipulation routines.


> Ben Root
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110616/e9cd7c58/attachment.html 

More information about the NumPy-Discussion mailing list