[Numpy-discussion] timezones and datetime64

Mark Wiebe mwwiebe@gmail....
Wed Apr 3 16:03:16 CDT 2013


On Wed, Apr 3, 2013 at 9:33 AM, Chris Barker - NOAA Federal <
chris.barker@noaa.gov> wrote:

> <dave.hirschfeld@gmail.com> wrote:
> >  I found no reasonable way around it other than bypassing the numpy
> conversion entirely
>
> Exactly - we have come to the same conclusion. By the way, it's also
> consistent -- an ISO string without a TZ is interpreted as a to mean
> "use the locale", but a datetime object without a TZ is interpreted as
> UTC, so you get this:
>
> In [68]: dt
> Out[68]: datetime.datetime(2013, 4, 3, 12, 0)
>
> In [69]: np.dateti
> np.datetime64          np.datetime_as_string  np.datetime_data
>
> In [69]: np.datetime64(dt)
> Out[69]: numpy.datetime64('2013-04-03T05:00:00.000000-0700')
>
> In [70]: np.datetime64(dt.iso)
> dt.isocalendar  dt.isoformat    dt.isoweekday
>
> In [70]: np.datetime64(dt.isoformat())
> Out[70]: numpy.datetime64('2013-04-03T12:00:00-0700')
>
> two different results!
>
> (and as it happens, datetime.datetime does not have an ISO string
> parser, so it's not completely trivial to round-trip though that...)



> On Wed, Apr 3, 2013 at 6:49 AM, Nathaniel Smith <njs@pobox.com> wrote:
>
> > Wow, that's truly broken. I'm sorry.
>
> Did you put this in? break out the pitchforks! (  ;-) )


Many of the aspects of how the datetime64 is are from me. I started out
from the datetime64 NEP, but it wasn't fleshed out enough so I had to fill
in lots of details. I guess your pitchforks are pointing at me. ;)

For the way this specific part of the code is, I think it's hard to not
have it broken one way or another, no matter how we do it. One thing I
observed is the printing of getting the current time is weird if you're
looking at it interactively. In general, if you get the current time, and
print it in UTC, it's the wrong time unless you're in UTC. Python's
datetime doesn't help the situation by having datetime.now() return a
'local' time.

In [1]: import numpy as np

In [2]: from datetime import datetime

In [3]: np.datetime64('now')

Out[3]: numpy.datetime64('2013-04-03T12:17:58-0700')

In [4]: np.datetime_as_string(np.datetime64('now'), timezone='UTC')

Out[4]: '2013-04-03T19:17:59Z'

In [5]: datetime.now()

Out[5]: datetime.datetime(2013, 4, 3, 12, 18, 2, 582000)

In [6]: datetime.now().isoformat()

Out[6]: '2013-04-03T12:18:06.796000'

In [7]: np.datetime64(datetime.now())

Out[7]: numpy.datetime64('2013-04-03T05:18:15.525000-0700')

In [8]: np.datetime64(datetime.now().isoformat())

Out[8]: numpy.datetime64('2013-04-03T12:18:25.291000-0700')


> I'm skeptical that just switching to UTC everywhere is actually the
> > right solution. It smells like one of those solutions that's simple,
> > neat, and wrong.
>
> well, actually, I don't think UTC everywhere is quite what's proposed
> -- really it's naive datetimes -- it would be up to the
> user/application to make sure the time zones are consistent.
>

It seems to me that adding a time zone to the datetime64 metadata might be
a good idea, and then allowing it to be None to behave like the Python
naive datetimes. This wouldn't be a trivial addition, though. Using
Python's timezone object doesn't seem like a good idea, because would
require things to be converted to/from Python's datetime to be processed
every time, which would remove the performance benefits of NumPy. The boost
datetime library has a nice timezone object which could be used as
inspiration for an equivalent in NumPy, but I think any way we cut it would
be a lot of work.


> Which does mean that parsing a ISO string with a timezone becomes
> problematic...


Yeah, there are a number of cases.

How would it transform '2013-04-03T12:18' to a datetime64 with a timezone
by default? I guess that would be to use the datetime64's metadata probably.
How would it transform '2013-04-03T12:18Z' or '2013-04-03T12:18-0700' to a
datetime64 with no timezone? Do we throw an error in the default
conversion, and have a separate parsing function that allows more control?


> > (I don't know anything about calendar-time series
> > handling, so I have no ability to actually judge this stuff, but
> > wouldn't one problem be if you want to know about business days/hours?
>
> right -- then you'd want to use local time, so numpy might think it's
> ISO, but it'd actually be local time. Anyway, at the moment, I don't
> think datetime64 does this right anyway. I don't see mention of the
> timezone in the busday functions. I havne't checked to see if they use
> the locale TZ or ignore it, but either way is wrong (actually, using
> the locale setting is worse...)


The busday functions just operate on datetime64[D]. There is no timezone
interaction there, except for how a datetime with a date unit converts
to/from a datetime which includes time.


> > Maybe datetime dtypes should be parametrized by both granularity and
> > timezone?
>
> That may be a good option. However, I suspect it's pretty hard to
> actually use the timezone correctly and consistently, so I"m nervous
> about that. In any case, we'd need to make sure that the user could
> specify timezone on I/O and busday calculations, etc, and *never*
> assume the locale TZ (Or anything else about locale) unless asked for.
> Using the locale TZ is almost never the right thing to do for the kind
> of applications numpy is used for.


I think for local, interactive use, the locale timezone is good, but for
non-interactive use it's not. NumPy plays roles in both contexts, and has
many features that are skewed towards the interactive context, so it's not
clear to me that excluding the locale TZ would be a good idea.


> > Or we could just declare that datetime64 is always
> > timezone-naive and adjust the code to match?
>
> That would be the easy way to handle it -- from the numpy side, anyway.
>
> > I'll CC the pandas list in case they have some insight.
>
> I suspect pandas has their own way of dealing with all these issues
> already. Which makes me think that numpy should take the same approach
> as the python stdlib: provide a core datatype, but leave the use-case
> specific stuff for others to build on. For instance, it seems really
> odd to have the busday* functions in core numpy...


I believe Pandas is using datetime64[ns] for everything, and uses its own
code to allow for numpy 1.6 compatibility. It borrowed some code from numpy
1.7 to make this possible.


> > Unfortunately
> > AFAIK no-one who's regularly working on numpy this point works with
> > datetimes, so we have limited ability to judge solutions...
>
> well, that explains how this happened!
>
> > please help!
>
> in 1.7, it is still listed as experimental, so you could say this is
> all going as planned: release something we can try to use, and see
> what we find out when using it!
>
> I _think_ one reasonable option may be:
>
> 1) Internal is UTC
> 2) On input:
>    a) Default for no-time-zone-specified is UTC (both from datetime
> objects and ISO strings)
>    b) respect TZ if given, converting to UTC
> 3) On output:
>    a) default to UTC
>    a) provide a way for the user to specify the timezone desired
>       ( perhaps a TZ attribute somewhere, or functions to specifically
> convert to ISO strings and datetime objects that take an optional TZ
> parameter.)
> 4) busday* and the like allow a way to specify TZ
>
> Issues I immediate see with this:
>    Respecting the TZ on output is a problem because:
>      1)  if people want "naive" datetimes, they will get UTC ISO strings,
> i.e.:
>               '2013-04-03T05:00:00Z' rather than '2013-04-03T05:00:00'
>          - so there should be a way to specify "naive" or None as a
> timezone.
>
>      2)  the python datetime module doesn't have any tzinfo objects
> built in -- so to respect timezones, numpy would need to maintain its
> own, or depend on pytz
>
> Given all this, maybe naive is the way to go, perhaps mirroring
> datetime,datetime, an having an optional tzinfo object attribute. (by
> the way, I'm confused where that would live -- in the dtype instance?
> in the array?
>
> Issue with Naive: what do you do with an ISO string that specifies a TZ
> offset?
>
> I'm beginning to see why the datetime doesn't support reading ISO
> strings -- it would need to deal with timezones in that case!
>
> Another note about Timezones and ISO -- it doesn't really support
> timezones -- you specify an offset from UTC, that's it -- so you dont
> know if that is, for instance, Mountain Standard time or Pacific
> Daylight Time. All you can do with it is convert to UTC, but you don't
> have a way to convert back, as you don't know what the timezone is.
> We'd be taking on a heck of mess to support this! Hmm -- maybe only
> support ISO-like -- i.e. all we do is keep an offset around that can
> be re-applied on output if you want -- that's it.
>

Datetimes are complicated! The biggest advantage of using ISO for the
default string format is that it's unambiguous, it doesn't have the problem
like with '01/02/03' that could be interpreted in many different ways
depending on where in the world you are.

I suspect adding a timezone to the datetime64 metadata is the way to
proceed. We probably need to start up a new NEP about amending datetime64.
The previous one is here:
https://github.com/numpy/numpy/blob/master/doc/neps/datetime-proposal.rst


> That's it for now -- thanks for engaging!
>
> -Chris
>
> PS: I'm pretty sure that the C stdlib time handling functions give you
> no choice but to use the locale when they covert to strings, etc --
> this is a freaking nightmare, and I'm wondering if that's in fact why
> numpy does it. i.e it's easy to use the C lib functions, but writing
> your own requires the full TZ database, handling DST, etc. etc....


The C stdlib provides functions for doing timezone conversions with the
locale, but going deeper than that becomes a bit more OS-specific. This
seems like the kind of service the OS should provide, so that all libraries
would get updates to new timezone databases when they're updated, etc, but
unfortunately things aren't that simple.

Thanks,
Mark


>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker@noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130403/20a085d0/attachment-0001.html 


More information about the NumPy-Discussion mailing list