[Numpy-discussion] Default unit for datetime/timedelta

Pierre GM pgmdevlist@gmail....
Wed Jun 8 18:31:09 CDT 2011

On Jun 9, 2011, at 1:10 AM, Mark Wiebe wrote:
> > >>> np.timedelta64(10, 's') + 10
> > numpy.timedelta64(20,'s')
> Here, the unit is defined: 's'
>  For the first operand, the inconsistency is with the second. Here's the reasoning I didn't spell out:
> We're adding a timedelta + int, so lets convert 10 into a timedelta. No units specified, so it's
> 10 microseconds, so we add 10 seconds and 10 microseconds, not 10 seconds and 10 seconds.

Ah OK. I think that your approach of taking the defined unit (here, s) as unit of the undefined term (here, 10) is by far the best.

> >OK, here it is not. But the result makes sense... Up to a certain point. If you try to guess the unit from a date given as a >string, what happens in case of ambiguities ? Or do you restrict an input string to be strictly ISO8601 to remove those ?
> Yeah, I'm restricting the string to be (almost) strictly ISO8601. For supporting other formats, I think creating a 'fancy_date_parser' function or something like that would be better than having all those date string format ambiguities in the core type.

Quite OK. But this 'fancy_date_parser' will likely crash at some point if the unit cannot be guessed properly. But you're right, that's not the issue here.

> > I'd like to make 'M8' and 'm8' be datetime data types with generic time units instead of microseconds as they are currently. This would also allow the possibility of extending the behavior of detecting the unit from the input string as:
> >
> > >>> np.datetime64('2011-03-12T13')
> > numpy.datetime64('2011-03-12T13-0600','h')
> >
> > to also work with arrays, which currently work like this:
> >
> > >>> np.array(['2011-03-12T13', '2012'], dtype='M8')
> > array(['2011-03-12T13:00:00.000000-0600', '2011-12-31T18:00:00.000000-0600'], dtype='datetime64[us]')
> Why is the second one not '2012-01-01T00:00:00-0600' ?
> This is because dates are stored at midnight UTC, and when converted to local time for the default time-based printing, that changes slightly.
> ISO8601 specifies to interpret an input in local time if no "Z" or timezone offset is given, so that's why the first one matches. I haven't been able to think of a way around it other than putting warnings in the documentation, and have made 'today' and 'now' throw errors if you try to use them as times or dates respectively.

I see the logic, but I don't like it at all. I would expect the date to be stored in the local time zone by default (that is, if no other time zone info is available). 

More information about the NumPy-Discussion mailing list