[Numpy-discussion] fixing up datetime

Christopher Barker Chris.Barker@noaa....
Tue Jun 7 11:53:04 CDT 2011


Pierre GM wrote:
> Using the ISO as reference, you have a good definition of months.

Yes, but only one. there are others. For instance, the climate modelers 
like to use a calendar that has 360 days a year: 12 30 day months. That 
way they get something with the same timescale as months and years, but 
have nice, linear, easy to use units (differentiable, and all that).

Mark Wiebe wrote:
>     Code    Interpreted as
>     Y       12M, 52W, 365D
>     M       4W, 30D, 720h
> 
>     This is even self inconsistent:
> 
>     1Y == 365D
> 
>     1Y == 12M == 12 * 30D == 360D
> 
>     1Y == 12M == 12 * 4W == 12 * 4 * 7D == 336D
> 
>     1Y == 52W == 52 * 7D == 364D
> 
>     Is it not clear from this what a mess of mis-interpretation might result
>     from all that?
> 
> 
> This part of the code is used for mapping metadata like [Y/4] -> [3M], 
> or [Y/26] -> [2W]. I agree that this '/' operator in the unit metadata 
> is weird, and wouldn't object to removing it.

Weird, dangerous, and unnecessary. I can see how some data may be on, 
for example quarters, but that should require a definition of quarters 
that's more defined.

>     This goes to heck is the data is expressed in something like "months
>     since 1995-01-01"
> 
>     Because months are only defined on a Calendar.
> 
> 
> Here's what the current implementation can do with that one:
> 
>  >>> np.datetime64('1995-01-01', 'M') + 13
> numpy.datetime64('1996-02','M')

I see -- I have a better idea of the intent here, and I can see that as 
long as you keep everything in the same unit (say, months, in this 
case), then this can be a clean and effective way to deal with this sort 
of data.

As I said, the netcdf case is a different use case, but I think the 
issue there was that the creator of the data was thinking of it as being 
used like above: "months since January, 1995", and the data was all 
integer values for months, it makes perfect sense, and is well defined.

The problem in that case is that the standard does not have a 
specification that enforces that the units stay months, and that the 
intervals are integers -- so software looked at that, converted it to, 
for example, python datetime instances, using some pre-defined 
definition for the length of a month), and gt something that 
mis-represented the data.

The numpy use-case is different, but it's my concern that that same kind 
of thing could easily happen, because people want to write generic code 
that deals with arbitrary np.datetime64 instances.

I suppose we could consider this analogous to issues with integer an 
floating point dtypes -- when you convert between those, it's 
user-beware, but I think that would be more clear if we had a set of dtypes:

datetime_months
datetime_hours
datetime_seconds

But that list would get big in a hurry!

Also, with the Python datetime module, for instance, what I like about 
it is that I don't have to know or care how it's stored internally -- 
all I need to know is what range and precision it can deal with. numpy 
has performance issues that may not make that possible, but I still like it.

maybe two types:

datetime_calendar: for Calendar-type units (months, business days, ...)

datetime_continuous: for "linear units" (seconds, hours, ...)

or something like that?


-Chris





-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list