[Numpy-discussion] timezones and datetime64

Dave Hirschfeld dave.hirschfeld@gmail....
Wed Apr 3 08:26:50 CDT 2013


Andreas Hilboll <lists <at> hilboll.de> writes:

> 
> > 
> > I think your point about using current timezone in interpreting user
> > input being dangerous is probably correct --- perhaps UTC all the way
> > would be a safer (and simpler) choice?
> 
> +1
> 

+10 from me!

I've recently come across a bug due to the fact that numpy interprets dates as 
being in the local timezone.

The data comes from a database query where there is no timezone information 
supplied (and dates are stored as strings). It is assumed that the user doesn't 
need to know the timezone - i.e. the dates are timezone naive.

Working out the correct timezones would be fairly laborious, but whatever the 
correct timezones are, they're certainly not the timezone the current user 
happens to find themselves in!

e.g.

In [32]: rs = [
    ...: (u'2000-01-17 00:00:00.000000', u'2000-02-01', u'2000-02-29', 0.1203),
    ...: (u'2000-01-26 00:00:00.000000', u'2000-02-01', u'2000-02-29', 0.1369),
    ...: (u'2000-01-18 00:00:00.000000', u'2000-03-01', u'2000-03-31', 0.1122),
    ...: (u'2000-02-25 00:00:00.000000', u'2000-03-01', u'2000-03-31', 0.1425)
    ...: ]
    ...: dtype = [('issue_date', 'datetime64[ns]'),
    ...:          ('start_date', 'datetime64[D]'),
    ...:          ('end_date', 'datetime64[D]'),
    ...:          ('value', float)]
    ...: #

In [33]: # What I see in London, UK
    ...: recordset = np.array(rs, dtype=dtype)
    ...: df = pd.DataFrame(recordset)
    ...: df = df.set_index('issue_date')
    ...: df
    ...: 
Out[33]: 
                    start_date            end_date   value
issue_date                                                
2000-01-17 2000-02-01 00:00:00 2000-02-29 00:00:00  0.1203
2000-01-26 2000-02-01 00:00:00 2000-02-29 00:00:00  0.1369
2000-01-18 2000-03-01 00:00:00 2000-03-31 00:00:00  0.1122
2000-02-25 2000-03-01 00:00:00 2000-03-31 00:00:00  0.1425

In [34]: # What my colleague sees in Auckland, NZ
    ...: recordset = np.array(rs, dtype=dtype)
    ...: df = pd.DataFrame(recordset)
    ...: df = df.set_index('issue_date')
    ...: df
    ...: 
Out[34]: 
                             start_date            end_date   value
issue_date                                                         
2000-01-16 11:00:00 2000-02-01 00:00:00 2000-02-29 00:00:00  0.1203
2000-01-25 11:00:00 2000-02-01 00:00:00 2000-02-29 00:00:00  0.1369
2000-01-17 11:00:00 2000-03-01 00:00:00 2000-03-31 00:00:00  0.1122
2000-02-24 11:00:00 2000-03-01 00:00:00 2000-03-31 00:00:00  0.1425


Oh dear!

This isn't acceptable for my use case (in a multinational company) and I found 
no reasonable way around it other than bypassing the numpy conversion entirely 
by setting the dtype to object, manually parsing the strings and creating an 
array from the list of datetime objects.

Regards,
Dave



More information about the NumPy-Discussion mailing list