[Numpy-discussion] timezones and datetime64
Chris Barker - NOAA Federal
Tue Apr 2 14:42:51 CDT 2013
We've recently run into some issues with time zones and datetime64 (in
numpy 1.7). Specifically, there no longer seems to be a way to have
what the python datetime module calls "naive" datetime objects -- i.e.
ones that have no awareness of timezones. Moreover, datetime64 seems
to enforce the locale settings of the machine you're running on, with
no way to turn that off. This is a "Bad Thing™".
Time zones are a nightmare -- particularly a nightmare for computer
code, and particularly with Daylight Savings issues. As a result, an
application needs to either be fully-properly timezone aware, and
manage it all properly, or be completely timezone naive -- mixing the
two is a recipe for disaster!
(OK, OK, I'm being a little histrionic here...)
Getting timezone handling right is actually pretty tricky, and takes a
fair bit of code, and is incompatible with some simple libraries. As a
result, many of us punt and go with the naive approach. In particular
a major app I'm working on has always made it the responsibility of
the user to provide all input in the same timezone. If/when we do get
smarter, we'll still treat timezone handling as an I/O issue --
internally, all datetimes with be in the same, naive, timezone. So I
want it to be possible, and ideally easy, to use naive DateTime64s.
One way to think about it is that the UTC time zone is equivalent to a
naive object -- if you use UTC everywhere, no timezone conversions
will take place, but with numpy1.7, this can be a trick.
These issues come up in two core places:
1) Creating a DateTIme64
from the docs:
"ISO 8601 specifies to use the local time zone if none is explicitly given"
Well, yes and no -- ISO 8601 specifies that if no time zone is given,
it means "local time". But it does not specify what local time means,
nor how it should be used in a computer library.
dateTime64 seems to define it as "use the computer's locale setting
for the time zone" it also seems to not just keep the time zone
around as meta-data, but actually change the internal representation
to UTC. This is a really bad idea:
- people have their time zones set wrong
- or move their laptops around between time zones!
- people, for example, run models for a location that is not their
computer time zone
- people run apps through the web -- who knows or should care what
time zone the server is in?
Note that this isn't just the string representation, but also
conversion from a datetime.datetime object:
In : dt = datetime.datetime(2013, 4, 2, 12)
In : dt64 = np.datetime64(dt)
In : dt64
This is particularly problematic, as the built-in datetime module has
no tzinfo objects -- you need a third-party library to supply them...
2) creating something else from a datetime64:
In : np.datetime64('2013-04-02T12:00:00-07').astype(datetime.datetime)
Out: datetime.datetime(2013, 4, 2, 19, 0)
so I put in 12:00, but get back 19:00 -- and the datetime.datetime
object has lost the timezone info.
In : str(np.datetime64('2013-04-02T12:00:00Z'))
I put in a UTC datetime, but get a string representation in my locale
time -- this can get pretty ugly, particularly if you have to deal
Using the locale also means that you have to do DST whether you want
to or not, which can be weird:
In : np.arange('2013-03-10T07Z', '2013-03-10T12Z', dtype='datetime64[h]')
array(['2013-03-09T23-0800', '2013-03-10T00-0800', '2013-03-10T01-0800',
'2013-03-10T03-0700', '2013-03-10T04-0700'], dtype='datetime64[h]')
so not all the elements in the the same TZ
what happens when you want to go the reverse route -- very odd things with DST:
In : np.datetime64('2013-03-10T01:30')
In : np.datetime64('2013-03-10T02:00')
# I put in 2:00, get back 3:00 !
In : np.datetime64('2013-03-10T02:30')
# I put in 2:30, get back 3:30 !
In : np.datetime64('2013-03-10T03:00')
# I put in 3:00, get back 3:00 !
To deal with all this, what we'll have to do is ensure that we are
using UTC everywhere, and not ever use the built-in string
representation. As you can see from the above, that's kind of a pain
-- datetime.datetimes are often not timezone aware, people put
whatever strings they put in, etc.
My understanding is that datetime64 is still in experimental, and thus
we have room to make some changes, so I propose:
1) allow a "naive" datetime64 -- one with no specified timezone.
2) have the default for ISO string interpretation be naive if no TZ is specified
3) never use the locale setting unless explicitly asked for.
we'd need a way to specify timezone in string formatting, etc, not
sure how do that.
A couple questions:
Are there docs defining the internals of timezone handling, and how
one might change the timezone of an existing datetime64 array?
As I poke at this a bit, I"m noticing that maybe time zones aren't
handles at all internally -- rather, the conversion is done to UTC
when creating a datetime64, and conversion is then done to the locale
when creating a strng representation -- maybe nothing inside at all.
Does the timezone info survive saving in npz, etc?
PS: may have found a bug messing with arange and datetime64:
In : a = np.arange(np.datetime64('2013-04-02T12:00:00Z'))
Bus error: 10
( repeatable with 1.7.0, py2.7 OS-X 32 bit )
not good, even though that probably shouldn't be legal anyway. I'm
guessing it's using the raw 64 bit integer and trying got build an
array that big -- but it'd be better to get a ValueError.
Thoughts? In particular, how does Pandas or any other time series
package deal with all this?
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
More information about the NumPy-Discussion