[Numpy-discussion] timezones and datetime64

Dave Hirschfeld dave.hirschfeld@gmail....
Wed Apr 3 09:38:52 CDT 2013


Nathaniel Smith <njs <at> pobox.com> writes:

> 
> On Wed, Apr 3, 2013 at 2:26 PM, Dave Hirschfeld
> <dave.hirschfeld <at> gmail.com> wrote:
> >
> > This isn't acceptable for my use case (in a multinational company) and I 
found
> > no reasonable way around it other than bypassing the numpy conversion 
entirely
> > by setting the dtype to object, manually parsing the strings and creating an
> > array from the list of datetime objects.
> 
> Wow, that's truly broken. I'm sorry.
> 
> I'm skeptical that just switching to UTC everywhere is actually the
> right solution. It smells like one of those solutions that's simple,
> neat, and wrong. (I don't know anything about calendar-time series
> handling, so I have no ability to actually judge this stuff, but
> wouldn't one problem be if you want to know about business days/hours?
> You lose the original day-of-year once you move everything to UTC.)
> Maybe datetime dtypes should be parametrized by both granularity and
> timezone? Or we could just declare that datetime64 is always
> timezone-naive and adjust the code to match?
> 
> I'll CC the pandas list in case they have some insight. Unfortunately
> AFAIK no-one who's regularly working on numpy this point works with
> datetimes, so we have limited ability to judge solutions... please
> help!
> 
> -n
> 

It think simply setting the timezone to UTC if it's not specified would solve 
99% of use cases because IIUC the internal representation is UTC so numpy would 
be doing no conversion of the dates that were passed in. It was the conversion 
which was the source of the error in my example.

The only potential issue with this is that the dates might take along an 
incorrect UTC timezone, making it more difficult to work with naive datetimes.

e.g.

In [42]: d = np.datetime64('2014-01-01 00:00:00', dtype='M8[ns]')

In [43]: d
Out[43]: numpy.datetime64('2014-01-01T00:00:00+0000')

In [44]: str(d)
Out[44]: '2014-01-01T00:00:00+0000'

In [45]: pydate(str(d))
Out[45]: datetime.datetime(2014, 1, 1, 0, 0, tzinfo=tzutc())

In [46]: pydate(str(d)) == datetime.datetime(2014, 1, 1)
Traceback (most recent call last):

  File "<ipython-input-46-abfc0fee9b97>", line 1, in <module>
    pydate(str(d)) == datetime.datetime(2014, 1, 1)

TypeError: can't compare offset-naive and offset-aware datetimes


In [47]: pydate(str(d)) == datetime.datetime(2014, 1, 1, tzinfo=tzutc())
Out[47]: True

In [48]: pydate(str(d)).replace(tzinfo=None) == datetime.datetime(2014, 1, 1)
Out[48]: True


In this case it may be best to have numpy not try to set the timezone at all if 
none was specified. Given the internal representation is UTC I'm not sure this 
is feasible though so defaulting to UTC may be the best solution.

Regards,
Dave




More information about the NumPy-Discussion mailing list