[Numpy-discussion] RFC: A proposal for implementing some date/time types in NumPy
Fri Jul 11 12:52:32 CDT 2008
A Friday 11 July 2008, Christopher Barker escrigué:
> Francesc Alted wrote:
> > We are planning to implement some date/time types for NumPy,
> A couple questions/comments:
> > ``datetime64``
> > - Expressed in microseconds since POSIX epoch (January 1, 1970).
> > - Resolution: nanoseconds.
> how is that possible? Is that a typo?
Exactly. This should read *microseconds*. I've sent the corrected
> > This will be compatible with the Python ``datetime`` module
> very important!
> > Observations::
> > This will be not be fully compatible with the Python
> > ``datetime`` module neither in terms of precision nor time-span.
> > However, getters and setters will be provided for it (loosing
> > precision or overflowing as needed).
> How to you propose handling overflowing? Would it raise an exception?
Yes. We propose to use exactly the same exception handling than NumPy
(so it will be configurable by the user).
> Another option would be to have a version that stored the datetime in
> two values: say two int64s or something (kind of like complex numbers
> are handled). This would allow a long time span and nanosecond (or
> finer) precision. I guess it would require a bunch of math code to be
> written, however.
I suppose so, yes. Besides, this certainly violates the requeriment of
having a fast implementation (unless we want to use a lot of time
optimizing such a 'complex' date/time type). There is also the problem
of requiring more space. See later.
> > * ``timefloat64``
> > - Resolution: 1 microsecond (for +-32 years from epoch) or 14
> > digits (for distant years from epoch). So the precision is
> > *variable*.
> I'm not sure this is that useful, exactly for that reason. What's the
> motivation for it? I can see using a float for timedelta -- as, in
> general, you'll need less precision the linger your time span, but
> having precision depend on how far you happen to be from the epoch
> seems risky (though for anything I do, it wouldn't matter in the
Well, as I said before, we wanted this mainly for
geological/astronomical uses, but as this type has the property of
having microsecond resolution during the years [1902 - 2038], it would
be definitely useful for many other cases too.
I can say that Postgres, as for one, implements a datetime type based on
a float64 by default (although you can choose an int64 in compilation
time) with exactly the same properties than ``timefloat64``. So, if
Postgres is doing this, it should be definitely useful in many use
> > Example of use
> > In : t = datetime.datetime.now() # setter in action
> > In : t
> > Out: 733234384724 # representation as an int64 (scalar)
> hmm - could it return a numpy.datetime object instead, rather than a
> straight int64? I'd like to see a representation that is clearly
Could be. But we should not forget that we are implementing the type for
an array package, and the output can become cumbersome very soon.
What I wanted to avoid here was having this:
[datetime(2008, 7, 11, 19, 16, 10, 996509), datetime(2008, 7, 11, 19,
16, 10, 996535), datetime(2008, 7, 11, 19, 16, 10, 996547),
datetime(2008, 7, 11, 19, 16, 10, 996559), datetime(2008, 7, 11, 19,
16, 10, 996568), dtype="datetime64"]
I prefer to see this:
[733234000000, 733234000000, 733234000000, 733234000000, 733234000000,
Hmm, although for a scalar representation, I agree that this is a bit
too terse. Maybe adding a 'T' (meaning 'T'ime type) and the end would
In : t
[733234000000T, 733234000000T, 733234000000T, 733234000000T,
But it would be interesting to see what other people thinks.
> > About the ``mx.DateTime`` module
> > --------------------------------
> > In this document, the emphasis has been put in comparing the
> > compatibility of future NumPy date/time types against the
> > ``datetime`` module that comes with Python. Should we consider the
> > compatibility with mx.DateTime as well?
> No. The whole point of python's standard datetime is to have a common
> system with which to deal with date-time values -- it's too bad it
> didn't come sooner, so that mx.DateTime could have been built on it,
> but at this point, I think supporting the standard lib one is most
> I couldn't find documentation (not quickly, anyway) of how the
> datetime object stores its data internally, but it might be nice to
> support that protocol directly -- maybe that would make for too much
> math code to write, though.
The internal format for the datetime module is documented in the
sources, and at first sight, supporting the protocol shouldn't be too
> What about timedelta types?
Well, we deliberately have left timedelta out because we think that any
of the three proposed types can act as a timedelta (this is also
another reason for keeping the proposed representation, i.e. don't show
year/month/day/etc... info). In fact, if they represent an absolute
time is by the convention of having the origin of time in the UNIX
epoch. But if you don't impose this convention for your array, all of
timetypes can represent timedeltas.
However, I suppose that there is a problem with the getters and setters
here, that is, how external ``datetime`` timedeltas interacts with the
new NumPy date/time types. Thinking a bit, the setter should be
relatively easy to implement:
In : numpy.datetime64(datetime.timedelta(12))
Out : 12T
For the getter, one can think on adding a new method (only available for
the date/time types):
In : t = numpy.datetime64(datetime.timedelta(12))
In : t.totimedelta()
Out : datetime.timedelta(12)
IMO, that would solve the issue without having to implement specific
> My final thought is that while I see that different applications need
> different properties, having multiple representations seems like it
> will introduce a lot of maintenance, documentation and support
> issues. Maybe a single, more complicated representation would be a
> better bet (like using two ints, rather than one, to get both range
> and precision)
Yeah, but besides the fact that implementation would be quite slower,
this sort of structs of two 'int64' would take twice the space of the
proposed timetypes, and this can be killer for a package that is meant
for dealing with large arrays of data. [Incidentally, I was even
pondering to introduce some 32-bit date/time precisely for saving
space, but as the usability of such a type would be really restricted,
in the end I've opted to not including it].
> Thanks for working on this -- I think it will be a great addition to
Thanks for excellent feedback too!
More information about the Numpy-discussion