[Numpy-discussion] fixing up datetime
Thu Jun 9 13:27:56 CDT 2011
Mark Wiebe wrote:
> Because datetime64 is a NumPy data type, it needs a well-defined rule
> for these kinds of conversions. Treating datetimes as moments in time
> instead of time intervals makes a very nice rule which appears to be
> very amenable to a variety of computations, which is why I like the
This really is the key issue that I've been harping on (sorry...) this
For many uses, a datetime as a moment in time is a great abstraction,
and I think how most datetime implementations (like the std lib one) are
However, when you are trying to represent/work with data like monthly
averages and the like, you need something that represents something else
-- and trying to use the same mechanism as for time instants, and hoping
the the ambiguities will resolve themselves from the context is dangerous.
I don't work in finance, so I'm not sure about things like b-monthly
payments -- it seems those could well be defined as instances -- the
payments are due on a given day each month( say the 1st and 15th), and,
I assume that is well defined to the instant -- i.e. before the end of
the day in some time zone. (note that that would be hour time:
23:59.99999, rather than, zero, however). The trick with these comes in
when you do math -- the timedelta issue -- what is a 1 month timedelta?
It's NOT an given number of days, hours, etc.
I don't know that anyone has time to do this, but it seems a written up
set of use-cases would help focus this conversation -- I know I've
pretty lost of what uses we are trying to support.
can you instantiate a datetime64 with something other than a string?
i.e. a (year, [month], [day], [hour], [second], [usecond]) tuple?
> The fact that it's a NumPy dtype probably is the biggest limiting
> factor preventing parameters like 'start' and 'end' during conversion.
> Having a datetime represent an instant in time neatly removes any
> ambiguity, so converting between days and seconds as a unit is
> analogous to converting between int32 and float32.
Sure, but I don't know that that is the best way to go -- integers are
precisely defined and generally used as 3 == 3.00000000 That's not the
case for months, at least if it's supposed be be a monthly average-type
This reminds me a question recently on this list -- someone was using
np.histogram() to bin integer values, and was surprised at the results
-- what they needed to do was consider the bin intervals as floating
point numbers to get what they wanted: 0.5, 1.5, 2.5, rather than
1,2,3,4, because what they really wanted was an categorical definition
of an integer, NOT a truncated floating point number. I'm not sure how
that informs this conversation, though...
> > >>> np.timedelta64(10, 's') + 10
> > numpy.timedelta64(20,'s')
> Here, the unit is defined: 's'
> For the first operand, the inconsistency is with the second. Here's
> the reasoning I didn't spell out:
> We're adding a timedelta + int, so lets convert 10 into a timedelta.
> No units specified, so it's
> 10 microseconds, so we add 10 seconds and 10 microseconds, not 10
> seconds and 10 seconds.
This sure seems ripe for error to me -- if a datetime and timedelta are
going to be represented in various possible units, then I don't think it
it's a good idea to allow one to and an integer -- especially if the
unit can be inferred from the input data, rather than specified.
"Explicit is better than implicit."
"In the face of ambiguity, refuse the temptation to guess."
If you must allow this, then using the default for the unspecified unit
as above is the way to go.
Dave Hirschfeld wrote:
>> Here are some current behaviors that are inconsistent with the microsecond
> default, but consistent with the "generic time unit" idea:
>>>>> np.timedelta64(10, 's') + 10
> That is what I would expect (and hope) would happen. IMO an integer should be
> cast to the dtype ([s]) of the datetime/timedelta.
This is way too ripe for error, particularly if we have the unit
auto-determined from input data.
Not to take us back to a probably already resolved issue, but maybe all
this unit conversion could and should be avoided by following the python
datetime approach -- all datetimes and timedeltas are always defined
with microsecond precision -- period.
Maybe there are computational efficiencies that we want to avoid.
This would also preclude any use of these dtypes for work that required
greater precision, but does anyone really need both year, month, day
specification AND nanoseconds? Given all the leap-second issues, that
seems a bit ridiculous.
But it would make things easier.
I note that in this entire conversation, all the talk has been about
finance examples -- I think I'm the only one that has brought up science
use, and that only barely (and mostly the simple cases). So do we really
need to have the same dtype useful for finance and particle physics?
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
More information about the NumPy-Discussion