[Numpy-discussion] Time Zones and datetime64

Mark Wiebe mwwiebe@gmail....
Tue Apr 9 16:46:10 CDT 2013


On Mon, Apr 8, 2013 at 12:24 PM, Chris Barker - NOAA Federal <
chris.barker@noaa.gov> wrote:

> Recent discussion has made it clear that the timezone handling in the
> current (numpy1.7) version of datetime64 is broken. Below is a
> discussion of some possible solutions, hopefully including most of the
> comments made on the recent thread on this list.
>
> http://mail.scipy.org/pipermail/numpy-discussion/2013-April/066038.html
>
> The intent it that with a bit more discussion (focused, in this thread
> at least) on the time zone issues, rather than other DateTIme64
> issues, we can start a new datetime64 NEP.
>

This looks great, thanks for putting it together! I've put some comments
inline.

>
> Background:
> ===============================
>
>
> The current version (numpy 1.7) of datetime64 appears to handle
> timezones in the following ways:
>
> datetime64s are assumed to be in UTC internally. Time zone translation
> is done on I/O -- i.e creating a new datetime64 and outputting to text
> format or as a datetime.datetime object.
>

It might be better to say "defined" instead of "assumed", because that was
an explicit choice.


> When creating a datetime64 from an ISO string, the timezone info in
> the string is respected. If there is no timezone info in the string,
> the system time zone (locale setting) is used. On output
> (i.e.converting to text: __str__ and __repr__) the system locale is
> used to set the timezone.
>
> In [9]: np.datetime64('2013-04-08T12:00:00Z')
> Out[9]: numpy.datetime64('2013-04-08T05:00:00-0700')
>
> In [10]: np.datetime64('2013-04-08T12:00:00')
> Out[10]: numpy.datetime64('2013-04-08T12:00:00-0700')
>
> However, if a datetime,datetime is used without a tzinfo object (the
> common case, as no tzinfo objects are provided with the python
> stdlib), the timezone is assumed to be UTC:
>
> In [13]: dt
> Out[13]: datetime.datetime(2013, 4, 8, 12, 0)
>
> In [14]: np.datetime64(dt)
> Out[14]: numpy.datetime64('2013-04-08T05:00:00.000000-0700')
>
> which can give some odd results, as it's different if you convert the
> datetime object to a iso string first:
>
> In [15]: np.datetime64(dt.isoformat())
> Out[15]: numpy.datetime64('2013-04-08T12:00:00-0700')
>
> Converting from a datetime64 to a datetime object uses the UTC time
> (the internal representation with no offset).
>
>
> Issues with the current configuration:
> ===============================
>
> Using the locale time zone is a long standing tradition, and used by
> the C standard library time functions. However, it is almost always
> NOT what one wants in a typical numpy application. When working with
> Scientific (and financial) datasets, the time zone of the data at hand
> is likely to have nothing to do with the timezone of the computer the
> code is running on. Also, with cloud computing and web applications,
> the time zone of the machine on which the code is running is
> irrelevant to the user. A number of early-adopters of datetime64 have
> found that they have needed to wrap creating and use of datetime64
> arrays to override the timezone behavior.
>
> The current implementation may be natural for some interactive use,
> but that's often not the case, and is particularly problematic when
> datetime.datetime.now() gives locale lime, but with no time zone info,
> so numpy actually appears to shift it.
>
> In [19]: datetime.datetime.now().isoformat()
> Out[19]: '2013-04-08T12:05:26.157475'
>
> In [20]: np.datetime64(datetime.datetime.now())
> Out[20]: numpy.datetime64('2013-04-08T05:05:45.813027-0700')
>
> This is really ugly -- and regardless if we like what the std lib
> does, we need to deal with it.
>
> The python standard library datetime implementation uses "naive"
> datetimes by default, with the provision for an optional "tzinfo"
> object, so that the user can supply timezone info if desired, However,
> the library does not provide any tzinfo objects out of the box. This
> is because timezones are messy, complicated, and change over time, and
> the core python devs did not want to be in the position of maintaining
> that code. There is a third party "pytz" package
> (http://pytz.sourceforge.net/) that provides a pretty complete
> implementation of time zone handling for those that need it.
>
> Note also that in the current implementation, The busday functions
> just operate on datetime64[D]. There is no timezone interaction there
> -- which makes it very hard for them to be useful, as it's a bit
> tricky to make sure your datetime64 arrays are in the correct time
> zone for your application. In fact, they are assuming that datetime64
> is time zone naive, even though the I/O functions assume locale time.
>

The datetime64[D] type itself doesn't interact with time zones, for example:

In [2]: np.datetime64('2000-03-12')

Out[2]: numpy.datetime64('2000-03-12')


doesn't use a time zone. Where time zones come into play is when converting
between datetime64[D] and datetime64[s], or other time-unit datetimes:

In [12]: a = np.array(["2012-03-02T22:00", "2013-02-01T01:00"], dtype='M8')

In [13]: a

Out[13]: array(['2012-03-02T22:00-0800', '2013-02-01T01:00-0800'],
dtype='datetime64[m]')

In [14]: a.astype('M8[D]')

Out[14]: array(['2012-03-03', '2013-02-01'], dtype='datetime64[D]')


The casting rules disallow conversion from time to date units, except under
the 'unsafe' rule. That's unfortunately the default for the astype function
though, so if we override the rule, we get:


 In [15]: a.astype('M8[D]', casting='same_kind')

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-15-93cfc21b90d8> in <module>()

----> 1 a.astype('M8[D]', casting='same_kind')

TypeError: Cannot cast array from dtype('<M8[m]') to dtype('<M8[D]')
according to the rule 'same_kind'


Because there's no equivalent to datetime_as_string for converting to
dates, to handle the time zone requires this kind of trickery:

In [17]: np.array([x[:10] for x in np.datetime_as_string(a,
timezone='local')], dtype='M8[D]')

Out[17]: array(['2012-03-02', '2013-02-01'], dtype='datetime64[D]')


Proposed Alternatives:
> ======================
>
> Principles:
> ------------------
>
>  * Partial time zone handling is probably worse than none.
>  * The library should never apply the locale timezone (or any other)
> unless explicitly requested by the user.
>  * It should be possible (and easy) to use "naive" datetime64 arrays.
>
> 1) A naive datetime64:
> ====================
>
> This would follow, more or less, the python stdlib datetime approach
> (with no tzinfo object) -- ignore timezones entirely. This model
> leaves all time zone handling up to the user. In general, when working
> with this method, applications either: use UTC everywhere, or use
> "local time" everywhere, where local time is simply "all data is in
> the same time zone" and it's up to the user (or lib code) to make sure
> that's correct.
>
> Issues:
> ------------------
>
> The main issue with a naive datetime64 is what to do on creation if
> time zone information is supplied (i.e in a ISO 8601 string, or
> datetime object with non-None tzinfo). Options include:
>  - ignore the TZ info
>  - raise an exception if there is a TZ adjustment (other than UTC, 00:00,
> or Z)
>

I'd still raise the exception for 00:00 and Z, to me they're more like the
other time zone specifications than no time zone.

>
> I propose that we raise an exception, unless there is a way to pass an
> optional flag in to request timezone conversion.
>
> note that the stdlib datetime package does not provide an ISO8601
> parsing function, so it has ignored the issue.
>
> There is also the issue of what to provide on output/conversion. I propose:
>  - a datetime object with no tzinof
>  - ISO8601 with no tz adjustment
>
> np.datetime_as_string() could be used with options to allow the user
> to request a time zone offset.
>

Another thing to consider is adding some global state for default printing
of datetimes, similar to that for controlling the number of decimals when
printing floats. I don't like this kind of global state, but it would match
NumPy's current practice.


> 1) UTC-only
> ====================
>
> This would be very similar to a naive datetime64 -- there are no
> timezone adjustments with pure UTC -- and would be similar to the
> current implementation, except for I/O:
>
> On conversion to datetime64, time zone offset would be respected, if
> it exists in the ISO8601 string or the datetime object has a tzinfo
> attribute. The value would be stored in UTC.
>
> If there is no timezone info in the input string or datetime objects,
> UTC is assumed.
>
> On output -- UTC is used, no offset computed.
>
> Issues:
> ------------------
>
> The ISO string produced on output would logically contain the "Z" flag
> to indicate UTC. This may confuse some apps that really expect a naive
> datetime.
>
> If there were a way to pass in a flag to create ISO strings indicating
> the time zone, that would be perfect, probably using
> np.datetime_as_string()
>
>
> 3) Optional time zone support
> ==========================
>
> This would follow the standard library approach -- provide a hook for
> a tzinfo object -- and if there, handle properly. This would allow one
> to mix and match datetime64s that are in different time zones, etc.
>
> issues:
> ----------------
>
> The biggest issue is that to be useful, you'd need a comprehensive
> tzinfo package. pytz provides one, but then you'd need to go through
> the python layer for every item in an array -- killing performance.
> However, perhaps that would be worth it for those that need it, and
> folks that need performance could use naive datetime64s.
>
> There apparently is also a  datetime library in boost that has a nice
> timezone object which could be used as inspiration for an equivalent
> in NumPy. That could be a lot of work, though.
>
> 3) Full time zone support
> ==========================
>
> This would be similar to the above, except that every datetime64 array
> would be required to carry time zone info. This would probably be
> reasonable, as one could use UTC everywhere if you wanted the simple
> case. But it would require that a comprehensive tzinfo package be
> included with numpy -- likely something we don't want to have to
> maintain (even if someone wants to built it in the first place)
>
> issues:
> -----------
>
> We would still want ways to input/output naive datetimes -- some app
> simply don't want to deal with all this!
>
>
>
>
>
> As I (Chris Barker) am not in a postion to implement anything, I
> advocate the simplest possible approach -- which I think is Naive
> datetime and/or UTC only. But if people want more, and someone wants
> to implement it, great!
>
> Please add your comment, and maybe we'll get a NEP together.
>

Thanks again for putting this together,

Mark


> -Chris
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker@noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130409/a3bc528e/attachment-0001.html 


More information about the NumPy-Discussion mailing list