[Numpy-discussion] The date/time dtype and the casting issue

Tom Denniston tom.denniston@alum.dartmouth....
Tue Jul 29 12:21:39 CDT 2008


The datetime proposal is very impressive in its depth and thought.
For me as well as many other people this would be a massive
improvement to numpy and allow numpy to get a foothold in areas like
econometrics where R/S is now dominant.

I had one question regarding casting of strings:

I think it would be ideal if things like the following worked:

>>> series = numpy.array(['1970-02-01','1970-09-01'], dtype = 'datetime64[D]')
>>> series == '1970-02-01'
[True, False]

I view this as similar to:

>>> series = numpy.array([1,2,3], dtype=float)
>>> series == 2

1. However it does numpy recognizes that an int is comparable with a
float and does the float cast.  I think you want the same behavior
between strings that parse into dates and date arrays.  Some might
object that the relationship between string and date is more tenuous
than float and int, which is true, but having used my own homespun
date array numpy extension for over a year, I've found that the first
thing I did was wrap it into an object that handles these string->date
translations elegantly and that made it infinately more usable from an
ipython session.

2. Even more important to me, however, is the issue of date parsing.
The mx library does many things badly but it does do a great job of
parsing dates of many formats.  When you parse '1/1/95' or 1995-01-01'
it knows that you mean 19950101 which is really nice.  I believe the
scipy timeseries code for parsing dates is based on it.  I would
highly suggest starting with that level of functionality.  The one
major issue with it is an uninterpretable date doesn't throw an error
but becomes whatever date is right now.  That is obviously

3. Finally my current implementation uses floats uses nan to represent
an invalid date.  When you assign an element of an date array to None
it uses nan as the value.  When you assign a real date it puts in the
equivalent floating point value.  I have found this to be hugely
beneficial and just wanted to float the idea of reserving a value to
indicate the floating point equivalent of nan.  People might prefer
masked arrays as a solution, but I just wanted to float the idea.

Forgive me if any of this has already been covered.  There has been a
lot of volume on this subject and I've tried to read it all diligently
but may have missed a point or two.


More information about the Numpy-discussion mailing list