[Numpy-discussion] fixing up datetime

Christopher Barker Chris.Barker@noaa....
Thu Jun 9 13:27:56 CDT 2011


Mark Wiebe wrote:
> Because datetime64 is a NumPy data type, it needs a well-defined rule 
> for these kinds of conversions. Treating datetimes as moments in time 
> instead of time intervals makes a very nice rule which appears to be 
> very amenable to a variety of computations, which is why I like the 
> approach.

This really is the key issue that I've been harping on (sorry...) this 
whole thread:

For many uses, a datetime as a moment in time is a great abstraction, 
and I think how most datetime implementations (like the std lib one) are 
used.

However, when you are trying to represent/work with data like monthly 
averages and the like, you need something that represents something else 
-- and trying to use the same mechanism as for time instants, and hoping 
the the ambiguities will resolve themselves from the context is dangerous.

I don't work in finance, so I'm not sure about things like b-monthly 
payments -- it seems those could well be defined as instances -- the 
payments are due on a given day each month( say the 1st and 15th), and, 
I assume that is well defined to the instant -- i.e. before the end of 
the day in some time zone. (note that that would be hour time: 
23:59.99999, rather than, zero, however). The trick with these comes in 
when you do math -- the timedelta issue -- what is a 1 month timedelta? 
It's NOT an given number of days, hours, etc.

I don't know that anyone has time to do this, but it seems a written up 
set of use-cases would help focus this conversation -- I know I've 
pretty lost of what uses we are trying to support.

another question:

can you instantiate a datetime64 with something other than a string? 
i.e. a (year, [month], [day], [hour], [second], [usecond]) tuple?

> The fact that it's a NumPy dtype probably is the biggest limiting
 > factor preventing parameters like 'start' and 'end' during conversion.
 > Having a datetime represent an instant in time neatly removes any
 > ambiguity, so converting between days and seconds as a unit is
 > analogous to converting between int32 and float32.

Sure, but I don't know that that is the best way to go -- integers are 
precisely defined and generally used as 3 == 3.00000000 That's not the 
case for months, at least if it's supposed be be a monthly average-type 
representation.

This reminds me a question recently on this list -- someone was using 
np.histogram() to bin integer values, and was surprised at the results 
-- what they needed to do was consider the bin intervals as floating 
point numbers to get what they wanted: 0.5, 1.5, 2.5, rather than 
1,2,3,4, because what they really wanted was an categorical definition 
of an integer, NOT a truncated floating point number. I'm not sure how 
that informs this conversation, though...


 > > >>> np.timedelta64(10, 's') + 10
 > > numpy.timedelta64(20,'s')
 >
 > Here, the unit is defined: 's'
 >
 >  For the first operand, the inconsistency is with the second. Here's
 > the reasoning I didn't spell out:

 > We're adding a timedelta + int, so lets convert 10 into a timedelta.
 > No units specified, so it's
 > 10 microseconds, so we add 10 seconds and 10 microseconds, not 10
 > seconds and 10 seconds.

This sure seems ripe for error to me -- if a datetime and timedelta are 
going to be represented in various possible units, then I don't think it 
it's a good idea to allow one to and an integer -- especially if the 
unit can be inferred from the input data, rather than specified.

"Explicit is better than implicit."

"In the face of ambiguity, refuse the temptation to guess."

If you must allow this, then using the default for the unspecified unit 
as above is the way to go.

Dave Hirschfeld wrote:
>> Here are some current behaviors that are inconsistent with the microsecond
> default, but consistent with the "generic time unit" idea:
>>>>> np.timedelta64(10, 's') + 10
>> numpy.timedelta64(20,'s')
> 
> That is what I would expect (and hope) would happen. IMO an integer should be
> cast to the dtype ([s]) of the datetime/timedelta.

This is way too ripe for error, particularly if we have the unit 
auto-determined from input data.




Not to take us back to a probably already resolved issue, but maybe all 
this unit conversion could and should be avoided by following the python 
datetime approach -- all datetimes and timedeltas are always defined 
with microsecond precision -- period.

Maybe there are computational efficiencies that we want to avoid.

This would also preclude any use of these dtypes for work that required 
greater precision, but does anyone really need both year, month, day 
specification AND nanoseconds? Given all the leap-second issues, that 
seems a bit ridiculous.

But it would make things easier.

I note that in this entire conversation, all the talk has been about 
finance examples -- I think I'm the only one that has brought up science 
use, and that only barely (and mostly the simple cases). So do we really 
need to have the same dtype useful for finance and particle physics?


-Chris

















-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list