[Numpy-discussion] Default unit for datetime/timedelta
Wed Jun 8 18:10:26 CDT 2011
On Wed, Jun 8, 2011 at 5:48 PM, Pierre GM <firstname.lastname@example.org> wrote:
> On Jun 8, 2011, at 11:05 PM, Mark Wiebe wrote:
> > The NEP and current implementation of the datetime specifies microseconds
> as the default unit when constructing and converting to datetimes and
> AFAIU, the default is [us] when otherwise unspecified.
> Here are some current behaviors that are inconsistent with the microsecond
> default, but consistent with the "generic time unit" idea:
> > >>> np.timedelta64(10, 's') + 10
> > numpy.timedelta64(20,'s')
> Here, the unit is defined: 's'
For the first operand, the inconsistency is with the second. Here's the
reasoning I didn't spell out:
We're adding a timedelta + int, so lets convert 10 into a timedelta. No
units specified, so it's
10 microseconds, so we add 10 seconds and 10 microseconds, not 10 seconds
and 10 seconds.
This intuitive behavior which was specified in the NEP for + follows
naturally from having generic
units, but not from having a default of microseconds.
> >>> np.datetime64('2011-03-12') + 3
> > numpy.datetime64('2011-03-15','D')
> OK, here it is not. But the result makes sense... Up to a certain point. If
> you try to guess the unit from a date given as a string, what happens in
> case of ambiguities ? Or do you restrict an input string to be strictly
> ISO8601 to remove those ?
Yeah, I'm restricting the string to be (almost) strictly ISO8601. For
supporting other formats, I think creating a 'fancy_date_parser' function or
something like that would be better than having all those date string format
ambiguities in the core type.
> I'd like to make 'M8' and 'm8' be datetime data types with generic time
> units instead of microseconds as they are currently. This would also allow
> the possibility of extending the behavior of detecting the unit from the
> input string as:
> > >>> np.datetime64('2011-03-12T13')
> > numpy.datetime64('2011-03-12T13-0600','h')
> > to also work with arrays, which currently work like this:
> > >>> np.array(['2011-03-12T13', '2012'], dtype='M8')
> > array(['2011-03-12T13:00:00.000000-0600',
> '2011-12-31T18:00:00.000000-0600'], dtype='datetime64[us]')
> Why is the second one not '2012-01-01T00:00:00-0600' ?
This is because dates are stored at midnight UTC, and when converted to
local time for the default time-based printing, that changes slightly.
ISO8601 specifies to interpret an input in local time if no "Z" or timezone
offset is given, so that's why the first one matches. I haven't been able to
think of a way around it other than putting warnings in the documentation,
and have made 'today' and 'now' throw errors if you try to use them as times
or dates respectively.
> Otherwise, I'm all for it.
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion