[Numpy-discussion] Time Zones and datetime64

Chris Barker - NOAA Federal chris.barker@noaa....
Mon Apr 8 14:24:10 CDT 2013


Recent discussion has made it clear that the timezone handling in the
current (numpy1.7) version of datetime64 is broken. Below is a
discussion of some possible solutions, hopefully including most of the
comments made on the recent thread on this list.

http://mail.scipy.org/pipermail/numpy-discussion/2013-April/066038.html

The intent it that with a bit more discussion (focused, in this thread
at least) on the time zone issues, rather than other DateTIme64
issues, we can start a new datetime64 NEP.


Background:
===============================


The current version (numpy 1.7) of datetime64 appears to handle
timezones in the following ways:

datetime64s are assumed to be in UTC internally. Time zone translation
is done on I/O -- i.e creating a new datetime64 and outputting to text
format or as a datetime.datetime object.

When creating a datetime64 from an ISO string, the timezone info in
the string is respected. If there is no timezone info in the string,
the system time zone (locale setting) is used. On output
(i.e.converting to text: __str__ and __repr__) the system locale is
used to set the timezone.

In [9]: np.datetime64('2013-04-08T12:00:00Z')
Out[9]: numpy.datetime64('2013-04-08T05:00:00-0700')

In [10]: np.datetime64('2013-04-08T12:00:00')
Out[10]: numpy.datetime64('2013-04-08T12:00:00-0700')

However, if a datetime,datetime is used without a tzinfo object (the
common case, as no tzinfo objects are provided with the python
stdlib), the timezone is assumed to be UTC:

In [13]: dt
Out[13]: datetime.datetime(2013, 4, 8, 12, 0)

In [14]: np.datetime64(dt)
Out[14]: numpy.datetime64('2013-04-08T05:00:00.000000-0700')

which can give some odd results, as it's different if you convert the
datetime object to a iso string first:

In [15]: np.datetime64(dt.isoformat())
Out[15]: numpy.datetime64('2013-04-08T12:00:00-0700')

Converting from a datetime64 to a datetime object uses the UTC time
(the internal representation with no offset).


Issues with the current configuration:
===============================

Using the locale time zone is a long standing tradition, and used by
the C standard library time functions. However, it is almost always
NOT what one wants in a typical numpy application. When working with
Scientific (and financial) datasets, the time zone of the data at hand
is likely to have nothing to do with the timezone of the computer the
code is running on. Also, with cloud computing and web applications,
the time zone of the machine on which the code is running is
irrelevant to the user. A number of early-adopters of datetime64 have
found that they have needed to wrap creating and use of datetime64
arrays to override the timezone behavior.

The current implementation may be natural for some interactive use,
but that's often not the case, and is particularly problematic when
datetime.datetime.now() gives locale lime, but with no time zone info,
so numpy actually appears to shift it.

In [19]: datetime.datetime.now().isoformat()
Out[19]: '2013-04-08T12:05:26.157475'

In [20]: np.datetime64(datetime.datetime.now())
Out[20]: numpy.datetime64('2013-04-08T05:05:45.813027-0700')

This is really ugly -- and regardless if we like what the std lib
does, we need to deal with it.

The python standard library datetime implementation uses "naive"
datetimes by default, with the provision for an optional "tzinfo"
object, so that the user can supply timezone info if desired, However,
the library does not provide any tzinfo objects out of the box. This
is because timezones are messy, complicated, and change over time, and
the core python devs did not want to be in the position of maintaining
that code. There is a third party "pytz" package
(http://pytz.sourceforge.net/) that provides a pretty complete
implementation of time zone handling for those that need it.

Note also that in the current implementation, The busday functions
just operate on datetime64[D]. There is no timezone interaction there
-- which makes it very hard for them to be useful, as it's a bit
tricky to make sure your datetime64 arrays are in the correct time
zone for your application. In fact, they are assuming that datetime64
is time zone naive, even though the I/O functions assume locale time.


Proposed Alternatives:
======================

Principles:
------------------

 * Partial time zone handling is probably worse than none.
 * The library should never apply the locale timezone (or any other)
unless explicitly requested by the user.
 * It should be possible (and easy) to use "naive" datetime64 arrays.

1) A naive datetime64:
====================

This would follow, more or less, the python stdlib datetime approach
(with no tzinfo object) -- ignore timezones entirely. This model
leaves all time zone handling up to the user. In general, when working
with this method, applications either: use UTC everywhere, or use
"local time" everywhere, where local time is simply "all data is in
the same time zone" and it's up to the user (or lib code) to make sure
that's correct.

Issues:
------------------

The main issue with a naive datetime64 is what to do on creation if
time zone information is supplied (i.e in a ISO 8601 string, or
datetime object with non-None tzinfo). Options include:
 - ignore the TZ info
 - raise an exception if there is a TZ adjustment (other than UTC, 00:00, or Z)

I propose that we raise an exception, unless there is a way to pass an
optional flag in to request timezone conversion.

note that the stdlib datetime package does not provide an ISO8601
parsing function, so it has ignored the issue.

There is also the issue of what to provide on output/conversion. I propose:
 - a datetime object with no tzinof
 - ISO8601 with no tz adjustment

np.datetime_as_string() could be used with options to allow the user
to request a time zone offset.

1) UTC-only
====================

This would be very similar to a naive datetime64 -- there are no
timezone adjustments with pure UTC -- and would be similar to the
current implementation, except for I/O:

On conversion to datetime64, time zone offset would be respected, if
it exists in the ISO8601 string or the datetime object has a tzinfo
attribute. The value would be stored in UTC.

If there is no timezone info in the input string or datetime objects,
UTC is assumed.

On output -- UTC is used, no offset computed.

Issues:
------------------

The ISO string produced on output would logically contain the "Z" flag
to indicate UTC. This may confuse some apps that really expect a naive
datetime.

If there were a way to pass in a flag to create ISO strings indicating
the time zone, that would be perfect, probably using
np.datetime_as_string()


3) Optional time zone support
==========================

This would follow the standard library approach -- provide a hook for
a tzinfo object -- and if there, handle properly. This would allow one
to mix and match datetime64s that are in different time zones, etc.

issues:
----------------

The biggest issue is that to be useful, you'd need a comprehensive
tzinfo package. pytz provides one, but then you'd need to go through
the python layer for every item in an array -- killing performance.
However, perhaps that would be worth it for those that need it, and
folks that need performance could use naive datetime64s.

There apparently is also a  datetime library in boost that has a nice
timezone object which could be used as inspiration for an equivalent
in NumPy. That could be a lot of work, though.

3) Full time zone support
==========================

This would be similar to the above, except that every datetime64 array
would be required to carry time zone info. This would probably be
reasonable, as one could use UTC everywhere if you wanted the simple
case. But it would require that a comprehensive tzinfo package be
included with numpy -- likely something we don't want to have to
maintain (even if someone wants to built it in the first place)

issues:
-----------

We would still want ways to input/output naive datetimes -- some app
simply don't want to deal with all this!





As I (Chris Barker) am not in a postion to implement anything, I
advocate the simplest possible approach -- which I think is Naive
datetime and/or UTC only. But if people want more, and someone wants
to implement it, great!

Please add your comment, and maybe we'll get a NEP together.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list