[Numpy-discussion] fixing up datetime

Mark Wiebe mwwiebe@gmail....
Thu Jun 2 12:15:10 CDT 2011


On Thu, Jun 2, 2011 at 11:57 AM, Christopher Barker
<Chris.Barker@noaa.gov>wrote:

> Mark Wiebe wrote:
> > I'm following what I understand the NEP to mean for combining dates and
> > deltas of different units. This means for timedeltas, the metadata
> > becomes more precise, in particular it becomes the GCD of the input
> > metadata, and between timedelta and datetime the datetime always
> dominates.
> >
> >
> https://github.com/numpy/numpy/blob/master/doc/neps/datetime-proposal.rst
>
> Thanks for posting this link -- a few comments on that doc follow.
>
> > Only Years, Months, and Business Days have a nonlinear relationship with
> > the other units, so they're the only problem case for this. They can be
> > arbitrarily special-cased based on what is decided to make the most
> sense.
>
> As mentioned on my recent post -- this stuff should be handles by some
> sort of "calendar" classes -- there is no one way to do that! So numpy
> should provide datetime and timedelta data types that can be used, but a
> timedelta should _not_ ever be defined by these weird variable units.
>
> I guess what I'm getting is that:
>
> a_date_time + a_timedelta
>
> is a fundamentally different operation than:
>
> a_date_time + a_calendar_defined_timespan
>
> The former can follow all the usual math properties for addition, but
> the later doesn't.
>

It is possible to implement the system so that if you don't use Y/M/B,
things work out unambiguously, but if you do use them you get a behavior
that's a little weird, but with rules to eliminate the calendar-created
ambiguities. For the business day unit, what I'm currently trying to do is
get an assessment of whether my proposed design the right abstraction to
support all the use cases of people who want it.

About the NEP:
>
> """
> A representation is also supported such that the stored date-time
> integer can encode both the number of a particular unit as well as a
> number of sequential events tracked for each unit.
> """
>
> I'm not sure I understand what this really means, but I _think_ I agree
> with Pierre that this is unnecessary complication - couldn't it be
> handled by multiple arrays, or maybe a structured dtype?
>

I think it depends on its use cases, which unfortunately aren't actually
described in the NEP.


> """
> The datetime64 represents an absolute time. Internally it is represented
> as the number of time units between the intended time and the epoch
> (12:00am on January 1, 1970 --- POSIX time including its lack of leap
> seconds).
> """
>
> The CF netcdf metadata standard provides for times to be specified as
> "units since a_date_time". units can be seconds, hours, days, etc (it
> does allow months and years, but it shouldn't!). This is nice, flexible
> system that makes it easy to capture wildly different scales needed:
> from nanoseconds to millennia. Similarly, we might want to consider a
> datetime dtype as containing a reference datetime, and a tic unit.
>
> I think the "Time units" section does specify that you can use various
> units, but it looks like the NEP sticks with the single POSIX epoch.
>
> I see later in the NEP:
> """
> However, after thinking more about this, we found that the combination
> of an absolute datetime64 with a relative timedelta64 does offer the
> same functionality while removing the need for the additional origin
> metadata. This is why we have removed it from this proposal.
> """
> hmmm -- I don't think that's the case -- you need the "origin" if you
> want to represent something like nanoseconds as a datetime, far away
> from the epoch. Sure, you can supply your own by keeping the origin and
> a timedelta array separately, by you could do that for all uses, also,
> and the point of this is to make working with datetimes easy. If we're
> going to allow different units, we might as well have different "origins".
>

I rather agree here, adding the 'origin' back in is definitely worth
considering. How is the origin represented in the CF netcdf code?


> I also don't think that units like "month", "year", "business day"
> should be allowed -- it just adds confusion. It's not a killer if they
> are defined in the spec:
>
> 1 year = 365.25 days (for instance0
> 1 month = 1year/12
>
> But I think it's better to simply disallow them, and keep that use for
> what I'm calling the "Calendar" functions. And "business day" is
> particularly ugly, and, I'm sure defined differently in different places.


So using the calendar specified by ISO 8601 as the default for the
calendar-based functions is undesirable? I think supporting it to a small
extent is reasonable, and support for any other calendars or more advanced
calendar-based functions would go in support libraries. Having something
calendar-like is already implied by calling the type "datetime". instead of
just "timepoint" or something like that.

-Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110602/6bd3143a/attachment.html 


More information about the NumPy-Discussion mailing list