[Numpy-discussion] RFC: A (second) proposal for implementing some date/time types in NumPy

Francesc Alted faltet@pytables....
Wed Jul 16 11:44:36 CDT 2008


Hi,

After tons of excellent feedback received for our first proposal about 
the date/time types in NumPy Ivan and me have had another brainstorming 
session and ended with a new proposal for your consideration.

While this one does not reap all and every of the suggestions you have 
made, we think that it does represent a fair balance between 
capabilities and simplicity and that it can be a solid and efficient 
basis for build-up more date/time niceties on top of it (read a 
full-fledged ``DateTime`` array class).

Although the proposal is not complete, the essentials are there.
So, please read on.  We will be glad to hear your opinions.

Thanks!

-- 
Francesc Alted

====================================================================
 A (second) proposal for implementing some date/time types in NumPy
====================================================================

:Author: Francesc Alted i Abad
:Contact: faltet@pytables.com
:Author: Ivan Vilata i Balaguer
:Contact: ivan@selidor.net
:Date: 2008-07-16


Executive summary
=================

A date/time mark is something very handy to have in many fields where
one has to deal with data sets.  While Python has several modules that
define a date/time type (like the integrated ``datetime`` [1]_ or
``mx.DateTime`` [2]_), NumPy has a lack of them.

In this document, we are proposing the addition of a series of date/time
types to fill this gap.  The requirements for the proposed types are
two-folded: 1) they have to be fast to operate with and 2) they have to
be as compatible as possible with the existing ``datetime`` module that
comes with Python.


Types proposed
==============

To start with, it is virtually impossible to come up with a single
date/time type that fills the needs of every case of use.  So, after
pondering about different possibilities, we have stick with *two*
different types, namely ``datetime64`` and ``timedelta64`` (these names
are preliminary and can be changed), that can have different resolutions
so as to cover different needs.

**Important note:** the resolution is conceived here as a metadata that
  *complements* a date/time dtype, *without changing the base type*.

Now it goes a detailed description of the proposed types.


``datetime64``
--------------

It represents a time that is absolute (i.e. not relative).  It is
implemented internally as an ``int64`` type.  The internal epoch is
POSIX epoch (see [3]_).

Resolution
~~~~~~~~~~

It accepts different resolutions and for each of these resolutions, it
will support different time spans.  The table below describes the
resolutions supported with its corresponding time spans.

+----------------------+----------------------------------+
|     Resolution       |         Time span (years)        |
+----------------------+----------------------------------+
|  Code |   Meaning    |                                  |
+======================+==================================+
|   Y   |  year        |      [9.2e18 BC, 9.2e18 AC]      |
|   Q   |  quarter     |      [3.0e18 BC, 3.0e18 AC]      |
|   M   |  month       |      [7.6e17 BC, 7.6e17 AC]      |
|   W   |  week        |      [1.7e17 BC, 1.7e17 AC]      |
|   d   |  day         |      [2.5e16 BC, 2.5e16 AC]      |
|   h   |  hour        |      [1.0e15 BC, 1.0e15 AC]      |
|   m   |  minute      |      [1.7e13 BC, 1.7e13 AC]      |
|   s   |  second      |      [ 2.9e9 BC,  2.9e9 AC]      |
|   ms  |  millisecond |      [ 2.9e6 BC,  2.9e6 AC]      |
|   us  |  microsecond |      [290301 BC, 294241 AC]      |
|   ns  |  nanosecond  |      [  1678 AC,   2262 AC]      |
+----------------------+----------------------------------+

Building a ``datetime64`` dtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The proposed way to specify the resolution in the dtype constructor
is:

Using parameters in the constructor::

  dtype('datetime64', res="us")  # the default res. is microseconds

Using the long string notation::

  dtype('datetime64[us]')   # equivalent to dtype('datetime64')

Using the short string notation::

  dtype('T8[us]')   # equivalent to dtype('T8')

Compatibility issues
~~~~~~~~~~~~~~~~~~~~

This will be fully compatible with the ``datetime`` class of the
``datetime`` module of Python only when using a resolution of
microseconds.  For other resolutions, the conversion process will
loose precision or will overflow as needed.


``timedelta64``
---------------

It represents a time that is relative (i.e. not absolute).  It is
implemented internally as an ``int64`` type.

Resolution
~~~~~~~~~~

It accepts different resolutions and for each of these resolutions, it
will support different time spans.  The table below describes the
resolutions supported with its corresponding time spans.

+----------------------+--------------------------+
|     Resolution       |         Time span        |
+----------------------+--------------------------+
|  Code |   Meaning    |                          |
+======================+==========================+
|   W   |  week        |      +- 1.7e17 years     |
|   D   |  day         |      +- 2.5e16 years     |
|   h   |  hour        |      +- 1.0e15 years     |
|   m   |  minute      |      +- 1.7e13 years     |
|   s   |  second      |      +- 2.9e12 years     |
|   ms  |  millisecond |      +- 2.9e9 years      |
|   us  |  microsecond |      +- 2.9e6 years      |
|   ns  |  nanosecond  |      +- 292 years        |
|   ps  |  picosecond  |      +- 106 days         |
|   fs  |  femtosecond |      +- 2.6 hours        |
|   as  |  attosecond  |      +- 9.2 seconds      |
+----------------------+--------------------------+

Building a ``timedelta64`` dtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The proposed way to specify the resolution in the dtype constructor
is:

Using parameters in the constructor::

  dtype('timedelta64', res="us")  # the default res. is microseconds

Using the long string notation::

  dtype('timedelta64[us]')   # equivalent to dtype('datetime64')

Using the short string notation::

  dtype('t8[us]')   # equivalent to dtype('t8')

Compatibility issues
~~~~~~~~~~~~~~~~~~~~

This will be fully compatible with the ``timedelta`` class of the
``datetime`` module of Python only when using a resolution of
microseconds.  For other resolutions, the conversion process will
loose precision or will overflow as needed.


Example of use
==============

Here it is an example of use for the ``datetime64``::

  In [10]: t = numpy.zeros(5, dtype="datetime64[ms]")

  In [11]: t[0] = datetime.datetime.now()  # setter in action

  In [12]: t[0]
  Out[12]: '2008-07-16T13:39:25.315'   # representation in ISO 8601 
format

  In [13]: print t
  [2008-07-16T13:39:25.315  1970-01-01T00:00:00.0
  1970-01-01T00:00:00.0  1970-01-01T00:00:00.0  1970-01-01T00:00:00.0]

  In [14]: t[0].item()     # getter in action
  Out[14]: datetime.datetime(2008, 7, 16, 13, 39, 25, 315000)

  In [15]: print t.dtype
  datetime64[ms]

And here it goes an example of use for the ``timedelta64``::

  In [8]: t1 = numpy.zeros(5, dtype="datetime64[s]")

  In [9]: t2 = numpy.ones(5, dtype="datetime64[s]")

  In [10]: t = t2 - t1

  In [11]: t[0] = 24  # setter in action (setting to 24 seconds)

  In [12]: t[0]
  Out[12]: 24       # representation as an int64

  In [13]: print t
  [24  1  1  1  1]

  In [14]: t[0].item()     # getter in action
  Out[14]: datetime.timedelta(0, 24)

  In [15]: print t.dtype
  timedelta64[s]


Operating with date/time arrays
===============================

``datetime64`` vs ``datetime64``
--------------------------------

The only operation allowed between absolute dates is the subtraction::

  In [10]: numpy.ones(5, "T8") - numpy.zeros(5, "T8")
  Out[10]: array([1, 1, 1, 1, 1], dtype=timedelta64[us])

But not other operations::

  In [11]: numpy.ones(5, "T8") + numpy.zeros(5, "T8")
  TypeError: unsupported operand type(s) for +: 'numpy.ndarray' 
and 'numpy.ndarray'

``datetime64`` vs ``timedelta64``
---------------------------------

It will be possible to add and subtract relative times from absolute
dates::

  In [10]: numpy.zeros(5, "T8[Y]") + numpy.ones(5, "t8[Y]")
  Out[10]: array([1971, 1971, 1971, 1971, 1971], dtype=datetime64[Y])

  In [11]: numpy.ones(5, "T8[Y]") - 2 * numpy.ones(5, "t8[Y]")
  Out[11]: array([1969, 1969, 1969, 1969, 1969], dtype=datetime64[Y])

But not other operations::

  In [12]: numpy.ones(5, "T8[Y]") * numpy.ones(5, "t8[Y]")
  TypeError: unsupported operand type(s) for *: 'numpy.ndarray' 
and 'numpy.ndarray'

``timedelta64`` vs anything
---------------------------

Finally, it will be possible to operate with relative times as if they
were regular int64 dtypes *as long as* the result can be converted back
into a ``timedelta64``::

  In [10]: numpy.ones(5, 't8')
  Out[10]: array([1, 1, 1, 1, 1], dtype=timedelta64[us])

  In [11]: (numpy.ones(5, 't8[M]') + 2) ** 3
  Out[11]: array([27, 27, 27, 27, 27], dtype=timedelta64[M])

But::

  In [12]: numpy.ones(5, 't8') + 1j
  TypeError: The result cannot be converted into a ``timedelta64``


dtype/resolution conversions
============================

For changing the date/time dtype of an existing array, we propose to use
the ``.astype()`` method.  This will be mainly useful for changing
resolutions.

For example, for absolute dates::

  In[10]: t1 = numpy.zeros(5, dtype="datetime64[s]")

  In[11]: print t1
  [1970-01-01T00:00:00  1970-01-01T00:00:00  1970-01-01T00:00:00
   1970-01-01T00:00:00  1970-01-01T00:00:00]

  In[12]: print t1.astype('datetime64[d]')
  [1970-01-01  1970-01-01  1970-01-01  1970-01-01  1970-01-01]

For relative times::

  In[10]: t1 = numpy.ones(5, dtype="timedelta64[s]")

  In[11]: print t1
  [1 1 1 1 1]

  In[12]: print t1.astype('timedelta64[ms]')
  [1000 1000 1000 1000 1000]

Changing directly from/to relative to/from absolute dtypes will not be
supported::

  In[13]: numpy.zeros(5, dtype="datetime64[s]").astype('timedelta64')
  TypeError: data type cannot be converted to the desired type


Final considerations
====================

Why the ``origin`` metadata disappeared
---------------------------------------

During the discussion of the date/time dtypes in the NumPy list, the
idea of having an ``origin`` metadata that complemented the definition
of the absolute ``datetime64`` was initially found to be useful.

However, after thinking more about this, Ivan and me find that the
combination of an absolute ``datetime64`` with a relative
``timedelta64`` does offer the same functionality while removing the
need for the additional ``origin`` metadata.  This is why we have
removed it from this proposal.


Resolution and dtype issues
---------------------------

The date/time dtype's resolution metadata cannot be used in general as
part of typical dtype usage.  For example, in::

  numpy.zeros(5, dtype=numpy.datetime64)

we have to found yet a sensible way to pass the resolution.  Perhaps the
next would work::

  numpy.zeros(5, dtype=numpy.datetime64(res='Y'))

but we are not sure if this would collide with the spirit of the NumPy
dtypes.

At any rate, one can always do::

  numpy.zeros(5, dtype=numpy.dtype('datetime64', res='Y'))

BTW, prior to all of this, one should also elucidate whether::

  numpy.dtype('datetime64', res='Y')

or::

   numpy.dtype('datetime64[Y]')
   numpy.dtype('T8[Y]')

would be a consistent way to instantiate a dtype in NumPy.  We do really
think that could be a good way, but we would need to hear the opinion of
the expert.  Travis?



.. [1] http://docs.python.org/lib/module-datetime.html
.. [2] http://www.egenix.com/products/python/mxBase/mxDateTime
.. [3] http://en.wikipedia.org/wiki/Unix_time


.. Local Variables:
.. mode: rst
.. coding: utf-8
.. fill-column: 72
.. End:


-------------- next part --------------
====================================================================
 A (second) proposal for implementing some date/time types in NumPy
====================================================================

:Author: Francesc Alted i Abad
:Contact: faltet@pytables.com
:Author: Ivan Vilata i Balaguer
:Contact: ivan@selidor.net
:Date: 2008-07-16


Executive summary
=================

A date/time mark is something very handy to have in many fields where
one has to deal with data sets.  While Python has several modules that
define a date/time type (like the integrated ``datetime`` [1]_ or
``mx.DateTime`` [2]_), NumPy has a lack of them.

In this document, we are proposing the addition of a series of date/time
types to fill this gap.  The requirements for the proposed types are
two-folded: 1) they have to be fast to operate with and 2) they have to
be as compatible as possible with the existing ``datetime`` module that
comes with Python.


Types proposed
==============

To start with, it is virtually impossible to come up with a single
date/time type that fills the needs of every case of use.  So, after
pondering about different possibilities, we have stick with *two*
different types, namely ``datetime64`` and ``timedelta64`` (these names
are preliminary and can be changed), that can have different resolutions
so as to cover different needs.

**Important note:** the resolution is conceived here as a metadata that
  *complements* a date/time dtype, *without changing the base type*.

Now it goes a detailed description of the proposed types.


``datetime64``
--------------

It represents a time that is absolute (i.e. not relative).  It is
implemented internally as an ``int64`` type.  The internal epoch is
POSIX epoch (see [3]_).

Resolution
~~~~~~~~~~

It accepts different resolutions and for each of these resolutions, it
will support different time spans.  The table below describes the
resolutions supported with its corresponding time spans.

+----------------------+----------------------------------+
|     Resolution       |         Time span (years)        |
+----------------------+----------------------------------+
|  Code |   Meaning    |                                  |
+======================+==================================+
|   Y   |  year        |      [9.2e18 BC, 9.2e18 AC]      |
|   Q   |  quarter     |      [3.0e18 BC, 3.0e18 AC]      |
|   M   |  month       |      [7.6e17 BC, 7.6e17 AC]      |
|   W   |  week        |      [1.7e17 BC, 1.7e17 AC]      |
|   d   |  day         |      [2.5e16 BC, 2.5e16 AC]      |
|   h   |  hour        |      [1.0e15 BC, 1.0e15 AC]      |
|   m   |  minute      |      [1.7e13 BC, 1.7e13 AC]      |
|   s   |  second      |      [ 2.9e9 BC,  2.9e9 AC]      |
|   ms  |  millisecond |      [ 2.9e6 BC,  2.9e6 AC]      |
|   us  |  microsecond |      [290301 BC, 294241 AC]      |
|   ns  |  nanosecond  |      [  1678 AC,   2262 AC]      |
+----------------------+----------------------------------+

Building a ``datetime64`` dtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The proposed way to specify the resolution in the dtype constructor
is:

Using parameters in the constructor::

  dtype('datetime64', res="us")  # the default res. is microseconds

Using the long string notation::

  dtype('datetime64[us]')   # equivalent to dtype('datetime64')

Using the short string notation::

  dtype('T8[us]')   # equivalent to dtype('T8')

Compatibility issues
~~~~~~~~~~~~~~~~~~~~

This will be fully compatible with the ``datetime`` class of the
``datetime`` module of Python only when using a resolution of
microseconds.  For other resolutions, the conversion process will
loose precision or will overflow as needed.


``timedelta64``
---------------

It represents a time that is relative (i.e. not absolute).  It is
implemented internally as an ``int64`` type.

Resolution
~~~~~~~~~~

It accepts different resolutions and for each of these resolutions, it
will support different time spans.  The table below describes the
resolutions supported with its corresponding time spans.

+----------------------+--------------------------+
|     Resolution       |         Time span        |
+----------------------+--------------------------+
|  Code |   Meaning    |                          |
+======================+==========================+
|   W   |  week        |      +- 1.7e17 years     |
|   D   |  day         |      +- 2.5e16 years     |
|   h   |  hour        |      +- 1.0e15 years     |
|   m   |  minute      |      +- 1.7e13 years     |
|   s   |  second      |      +- 2.9e12 years     |
|   ms  |  millisecond |      +- 2.9e9 years      |
|   us  |  microsecond |      +- 2.9e6 years      |
|   ns  |  nanosecond  |      +- 292 years        |
|   ps  |  picosecond  |      +- 106 days         |
|   fs  |  femtosecond |      +- 2.6 hours        |
|   as  |  attosecond  |      +- 9.2 seconds      |
+----------------------+--------------------------+

Building a ``timedelta64`` dtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The proposed way to specify the resolution in the dtype constructor
is:

Using parameters in the constructor::

  dtype('timedelta64', res="us")  # the default res. is microseconds

Using the long string notation::

  dtype('timedelta64[us]')   # equivalent to dtype('datetime64')

Using the short string notation::

  dtype('t8[us]')   # equivalent to dtype('t8')

Compatibility issues
~~~~~~~~~~~~~~~~~~~~

This will be fully compatible with the ``timedelta`` class of the
``datetime`` module of Python only when using a resolution of
microseconds.  For other resolutions, the conversion process will
loose precision or will overflow as needed.


Example of use
==============

Here it is an example of use for the ``datetime64``::

  In [10]: t = numpy.zeros(5, dtype="datetime64[ms]")

  In [11]: t[0] = datetime.datetime.now()  # setter in action

  In [12]: t[0]
  Out[12]: '2008-07-16T13:39:25.315'   # representation in ISO 8601 format

  In [13]: print t
  [2008-07-16T13:39:25.315  1970-01-01T00:00:00.0
  1970-01-01T00:00:00.0  1970-01-01T00:00:00.0  1970-01-01T00:00:00.0]

  In [14]: t[0].item()     # getter in action
  Out[14]: datetime.datetime(2008, 7, 16, 13, 39, 25, 315000)

  In [15]: print t.dtype
  datetime64[ms]

And here it goes an example of use for the ``timedelta64``::

  In [8]: t1 = numpy.zeros(5, dtype="datetime64[s]")

  In [9]: t2 = numpy.ones(5, dtype="datetime64[s]")

  In [10]: t = t2 - t1

  In [11]: t[0] = 24  # setter in action (setting to 24 seconds)

  In [12]: t[0]
  Out[12]: 24       # representation as an int64

  In [13]: print t
  [24  1  1  1  1]

  In [14]: t[0].item()     # getter in action
  Out[14]: datetime.timedelta(0, 24)

  In [15]: print t.dtype
  timedelta64[s]


Operating with date/time arrays
===============================

``datetime64`` vs ``datetime64``
--------------------------------

The only operation allowed between absolute dates is the subtraction::

  In [10]: numpy.ones(5, "T8") - numpy.zeros(5, "T8")
  Out[10]: array([1, 1, 1, 1, 1], dtype=timedelta64[us])

But not other operations::

  In [11]: numpy.ones(5, "T8") + numpy.zeros(5, "T8")
  TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray'

``datetime64`` vs ``timedelta64``
---------------------------------

It will be possible to add and subtract relative times from absolute
dates::

  In [10]: numpy.zeros(5, "T8[Y]") + numpy.ones(5, "t8[Y]")
  Out[10]: array([1971, 1971, 1971, 1971, 1971], dtype=datetime64[Y])

  In [11]: numpy.ones(5, "T8[Y]") - 2 * numpy.ones(5, "t8[Y]")
  Out[11]: array([1969, 1969, 1969, 1969, 1969], dtype=datetime64[Y])

But not other operations::

  In [12]: numpy.ones(5, "T8[Y]") * numpy.ones(5, "t8[Y]")
  TypeError: unsupported operand type(s) for *: 'numpy.ndarray' and 'numpy.ndarray'

``timedelta64`` vs anything
---------------------------

Finally, it will be possible to operate with relative times as if they
were regular int64 dtypes *as long as* the result can be converted back
into a ``timedelta64``::

  In [10]: numpy.ones(5, 't8')
  Out[10]: array([1, 1, 1, 1, 1], dtype=timedelta64[us])

  In [11]: (numpy.ones(5, 't8[M]') + 2) ** 3
  Out[11]: array([27, 27, 27, 27, 27], dtype=timedelta64[M])

But::

  In [12]: numpy.ones(5, 't8') + 1j
  TypeError: The result cannot be converted into a ``timedelta64``


dtype/resolution conversions
============================

For changing the date/time dtype of an existing array, we propose to use
the ``.astype()`` method.  This will be mainly useful for changing
resolutions.

For example, for absolute dates::

  In[10]: t1 = numpy.zeros(5, dtype="datetime64[s]")

  In[11]: print t1
  [1970-01-01T00:00:00  1970-01-01T00:00:00  1970-01-01T00:00:00
   1970-01-01T00:00:00  1970-01-01T00:00:00]

  In[12]: print t1.astype('datetime64[d]')
  [1970-01-01  1970-01-01  1970-01-01  1970-01-01  1970-01-01]

For relative times::

  In[10]: t1 = numpy.ones(5, dtype="timedelta64[s]")

  In[11]: print t1
  [1 1 1 1 1]

  In[12]: print t1.astype('timedelta64[ms]')
  [1000 1000 1000 1000 1000]

Changing directly from/to relative to/from absolute dtypes will not be
supported::

  In[13]: numpy.zeros(5, dtype="datetime64[s]").astype('timedelta64')
  TypeError: data type cannot be converted to the desired type


Final considerations
====================

Why the ``origin`` metadata disappeared
---------------------------------------

During the discussion of the date/time dtypes in the NumPy list, the
idea of having an ``origin`` metadata that complemented the definition
of the absolute ``datetime64`` was initially found to be useful.

However, after thinking more about this, Ivan and me find that the
combination of an absolute ``datetime64`` with a relative
``timedelta64`` does offer the same functionality while removing the
need for the additional ``origin`` metadata.  This is why we have
removed it from this proposal.


Resolution and dtype issues
---------------------------

The date/time dtype's resolution metadata cannot be used in general as
part of typical dtype usage.  For example, in::

  numpy.zeros(5, dtype=numpy.datetime64)

we have to found yet a sensible way to pass the resolution.  Perhaps the
next would work::

  numpy.zeros(5, dtype=numpy.datetime64(res='Y'))

but we are not sure if this would collide with the spirit of the NumPy
dtypes.

At any rate, one can always do::

  numpy.zeros(5, dtype=numpy.dtype('datetime64', res='Y'))

BTW, prior to all of this, one should also elucidate whether::

  numpy.dtype('datetime64', res='Y')

or::

   numpy.dtype('datetime64[Y]')
   numpy.dtype('T8[Y]')

would be a consistent way to instantiate a dtype in NumPy.  We do really
think that could be a good way, but we would need to hear the opinion of
the expert.  Travis?



.. [1] http://docs.python.org/lib/module-datetime.html
.. [2] http://www.egenix.com/products/python/mxBase/mxDateTime
.. [3] http://en.wikipedia.org/wiki/Unix_time


.. Local Variables:
.. mode: rst
.. coding: utf-8
.. fill-column: 72
.. End:



More information about the Numpy-discussion mailing list