[Numpy-discussion] The date/time dtype and the casting issue
Ivan Vilata i Balaguer
Tue Jul 29 13:59:19 CDT 2008
Tom Denniston (el 2008-07-29 a les 12:21:39 -0500) va dir::
> I think it would be ideal if things like the following worked:
> >>> series = numpy.array(['1970-02-01','1970-09-01'], dtype = 'datetime64[D]')
> >>> series == '1970-02-01'
> [True, False]
> I view this as similar to:
> >>> series = numpy.array([1,2,3], dtype=float)
> >>> series == 2
> 1. However it does numpy recognizes that an int is comparable with a
> float and does the float cast. I think you want the same behavior
> between strings that parse into dates and date arrays. Some might
> object that the relationship between string and date is more tenuous
> than float and int, which is true, but having used my own homespun
> date array numpy extension for over a year, I've found that the first
> thing I did was wrap it into an object that handles these string->date
> translations elegantly and that made it infinately more usable from an
> ipython session.
That may be feasible as long as there is a very clear rule for what time
units you get given a string. For instance, '1970' could yield years
and '1970-03-12T12:00' minutes, but then we don't have a way of creating
a time in business days... However, it looks interesting. Any more
people interested in this behaviour?
> 2. Even more important to me, however, is the issue of date parsing.
> The mx library does many things badly but it does do a great job of
> parsing dates of many formats. When you parse '1/1/95' or 1995-01-01'
> it knows that you mean 19950101 which is really nice. I believe the
> scipy timeseries code for parsing dates is based on it. I would
> highly suggest starting with that level of functionality. The one
> major issue with it is an uninterpretable date doesn't throw an error
> but becomes whatever date is right now. That is obviously
Umm, that may get quite complex. E.g. does '1/2/95' refer to February
the 1st. or January the 2nd.? There are sooo many date formats and
standards that maybe using an external parser code (like mx, TimeSeries
or even datetime/strptime) for them would be preferable. I think the
ISO 8601 is enough for a basic, well defined time string support. At
least to start with.
> 3. Finally my current implementation uses floats uses nan to represent
> an invalid date. When you assign an element of an date array to None
> it uses nan as the value. When you assign a real date it puts in the
> equivalent floating point value. I have found this to be hugely
> beneficial and just wanted to float the idea of reserving a value to
> indicate the floating point equivalent of nan. People might prefer
> masked arrays as a solution, but I just wanted to float the idea.
Good news! Our next proposal includes a "Not a Time" value which came
around due to the impossibility of converting some times into business
days. Stay tuned.
However I should point out that the NaT value isn't as powerful as the
floating-point NaN, since the former is completely lacking of any sense
to hardware, and patching that in all cases would make computations
quite slower. Using floating point values doesn't look like an option
anymore, since they don't have a fixed precision given a time unit.
Ivan Vilata i Balaguer @ Intellectual Monopoly hinders Innovation! @
http://www.selidor.net/ @ http://www.nosoftwarepatents.com/ @
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 307 bytes
Desc: Digital signature
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080729/83dababc/attachment.bin
More information about the Numpy-discussion