[Numpy-discussion] timezones and datetime64
Chris Barker - NOAA Federal
Wed Apr 3 11:33:03 CDT 2013
> I found no reasonable way around it other than bypassing the numpy conversion entirely
Exactly - we have come to the same conclusion. By the way, it's also
consistent -- an ISO string without a TZ is interpreted as a to mean
"use the locale", but a datetime object without a TZ is interpreted as
UTC, so you get this:
In : dt
Out: datetime.datetime(2013, 4, 3, 12, 0)
In : np.dateti
np.datetime64 np.datetime_as_string np.datetime_data
In : np.datetime64(dt)
In : np.datetime64(dt.iso)
dt.isocalendar dt.isoformat dt.isoweekday
In : np.datetime64(dt.isoformat())
two different results!
(and as it happens, datetime.datetime does not have an ISO string
parser, so it's not completely trivial to round-trip though that...)
On Wed, Apr 3, 2013 at 6:49 AM, Nathaniel Smith <firstname.lastname@example.org> wrote:
> Wow, that's truly broken. I'm sorry.
Did you put this in? break out the pitchforks! ( ;-) )
> I'm skeptical that just switching to UTC everywhere is actually the
> right solution. It smells like one of those solutions that's simple,
> neat, and wrong.
well, actually, I don't think UTC everywhere is quite what's proposed
-- really it's naive datetimes -- it would be up to the
user/application to make sure the time zones are consistent.
Which does mean that parsing a ISO string with a timezone becomes problematic...
> (I don't know anything about calendar-time series
> handling, so I have no ability to actually judge this stuff, but
> wouldn't one problem be if you want to know about business days/hours?
right -- then you'd want to use local time, so numpy might think it's
ISO, but it'd actually be local time. Anyway, at the moment, I don't
think datetime64 does this right anyway. I don't see mention of the
timezone in the busday functions. I havne't checked to see if they use
the locale TZ or ignore it, but either way is wrong (actually, using
the locale setting is worse...)
> Maybe datetime dtypes should be parametrized by both granularity and
That may be a good option. However, I suspect it's pretty hard to
actually use the timezone correctly and consistently, so I"m nervous
about that. In any case, we'd need to make sure that the user could
specify timezone on I/O and busday calculations, etc, and *never*
assume the locale TZ (Or anything else about locale) unless asked for.
Using the locale TZ is almost never the right thing to do for the kind
of applications numpy is used for.
> Or we could just declare that datetime64 is always
> timezone-naive and adjust the code to match?
That would be the easy way to handle it -- from the numpy side, anyway.
> I'll CC the pandas list in case they have some insight.
I suspect pandas has their own way of dealing with all these issues
already. Which makes me think that numpy should take the same approach
as the python stdlib: provide a core datatype, but leave the use-case
specific stuff for others to build on. For instance, it seems really
odd to have the busday* functions in core numpy...
> AFAIK no-one who's regularly working on numpy this point works with
> datetimes, so we have limited ability to judge solutions...
well, that explains how this happened!
> please help!
in 1.7, it is still listed as experimental, so you could say this is
all going as planned: release something we can try to use, and see
what we find out when using it!
I _think_ one reasonable option may be:
1) Internal is UTC
2) On input:
a) Default for no-time-zone-specified is UTC (both from datetime
objects and ISO strings)
b) respect TZ if given, converting to UTC
3) On output:
a) default to UTC
a) provide a way for the user to specify the timezone desired
( perhaps a TZ attribute somewhere, or functions to specifically
convert to ISO strings and datetime objects that take an optional TZ
4) busday* and the like allow a way to specify TZ
Issues I immediate see with this:
Respecting the TZ on output is a problem because:
1) if people want "naive" datetimes, they will get UTC ISO strings, i.e.:
'2013-04-03T05:00:00Z' rather than '2013-04-03T05:00:00'
- so there should be a way to specify "naive" or None as a timezone.
2) the python datetime module doesn't have any tzinfo objects
built in -- so to respect timezones, numpy would need to maintain its
own, or depend on pytz
Given all this, maybe naive is the way to go, perhaps mirroring
datetime,datetime, an having an optional tzinfo object attribute. (by
the way, I'm confused where that would live -- in the dtype instance?
in the array?
Issue with Naive: what do you do with an ISO string that specifies a TZ offset?
I'm beginning to see why the datetime doesn't support reading ISO
strings -- it would need to deal with timezones in that case!
Another note about Timezones and ISO -- it doesn't really support
timezones -- you specify an offset from UTC, that's it -- so you dont
know if that is, for instance, Mountain Standard time or Pacific
Daylight Time. All you can do with it is convert to UTC, but you don't
have a way to convert back, as you don't know what the timezone is.
We'd be taking on a heck of mess to support this! Hmm -- maybe only
support ISO-like -- i.e. all we do is keep an offset around that can
be re-applied on output if you want -- that's it.
That's it for now -- thanks for engaging!
PS: I'm pretty sure that the C stdlib time handling functions give you
no choice but to use the locale when they covert to strings, etc --
this is a freaking nightmare, and I'm wondering if that's in fact why
numpy does it. i.e it's easy to use the C lib functions, but writing
your own requires the full TZ database, handling DST, etc. etc....
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
More information about the NumPy-Discussion