[Numpy-discussion] Implicit conversion of python datetime to numpy datetime64?

Benjamin Root ben.root@ou....
Wed Feb 15 10:36:16 CST 2012


On Wed, Feb 15, 2012 at 8:29 AM, Benjamin Root <ben.root@ou.edu> wrote:

>
>
> On Tuesday, February 14, 2012, Mark Wiebe <mwwiebe@gmail.com> wrote:
> > On Tue, Feb 14, 2012 at 9:37 PM, Benjamin Root <ben.root@ou.edu> wrote:
> >
> > On Tuesday, February 14, 2012, Mark Wiebe <mwwiebe@gmail.com> wrote:
> >> On Tue, Feb 14, 2012 at 8:17 PM, Benjamin Root <ben.root@ou.edu> wrote:
> >>>
> >>> Just a thought I had.  Right now, I can pass a list of python ints or
> floats into np.array() and get a numpy array with a sensible dtype.  Is
> there any reason why we can't do the same for python's datetime?  Right
> now, it is very easy for me to make a list comprehension of datetime
> objects using strptime(), but it is very awkward to make a numpy array out
> of it.
> >>
> >> I would consider this a bug, it's not behaving sensibly at present.
> Here's what it does for me:
> >>
> >> In [20]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for
> date in ["02/03/12",
> >>
> >>     ...: "07/22/98", "12/12/12"]], dtype="M8")
> >
> > Well, I guess it would be nice if I didn't even have to provide the
> dtype (I.e., inferred from the datetime type, since we aren't talking about
> strings).  But I hadn't noticed the above, I was just making object arrays.
> >
> >>
> >>
> ---------------------------------------------------------------------------
> >>
> >> TypeError Traceback (most recent call last)
> >>
> >> C:\Python27\Scripts\<ipython-input-20-d3b7b5392190> in <module>()
> >>
> >> 1 np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in
> ["02/03/12",
> >>
> >> ----> 2 "07/22/98", "12/12/12"]], dtype="M8")
> >>
> >> TypeError: Cannot cast datetime.datetime object from metadata [us] to
> [D] according to the rule 'same_kind'
> >>
> >> In [21]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for
> date in ["02/03/12",
> >>
> >>     ...: "07/22/98", "12/12/12"]], dtype="M8[us]")
> >>
> >> Out[21]:
> >>
> >> array(['2012-02-02T16:00:00.000000-0800',
> >>
> >> '1998-07-21T17:00:00.000000-0700', '2012-12-11T16:00:00.000000-0800'],
> dtype='datetime64[us]')
> >>
> >> In [22]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for
> date in ["02/03/12",
> >>
> >>     ...: "07/22/98", "12/12/12"]], dtype="M8[us]").astype("M8[D]")
> >>
> >> Out[22]: array(['2012-02-03', '1998-07-22', '2012-12-12'],
> dtype='datetime64[D]')
> >>>
> >>> The only barrier I can think of are those who have already built code
> around a object dtype array of datetime objects.
> >>>
> >>> Thoughts?
> >>> Ben Root
> >>>
> >>> P.S. - what ever happened to arange() and linspace() for datetime64?
> >>
> >> arange definitely works:
> >> In[28] np.arange('2011-03-02', '2011-04-01', dtype='M8')
> >> Out[28]:
> >> array(['2011-03-02', '2011-03-03', '2011-03-04', '2011-03-05',
> >>        '2011-03-06', '2011-03-07', '2011-03-08', '2011-03-09',
> >>        '2011-03-10', '2011-03-11', '2011-03-12', '2011-03-13',
> >>        '2011-03-14', '2011-03-15', '2011-03-16', '2011-03-17',
> >>        '2011-03-18', '2011-03-19', '2011-03-20', '2011-03-21',
> >>        '2011-03-22', '2011-03-23', '2011-03-24', '2011-03-25',
> >>        '2011-03-26', '2011-03-27', '2011-03-28', '2011-03-29',
> >>        '2011-03-30', '2011-03-31'], dtype='datetime64[D]')
> >> I didn't get to implementing linspace. I did look at it, but the
> current code didn't make it a trivial thing to put in.
> >> -Mark
> >
> > Sorry, I wasn't clear about arange, I meant that it would be nice if it
> could take python datetimes as arguments (and timedelat for the step?)
> because that is much more intuitive than remembering the exact dtype code
> and string format.
> >
> > I see it as the numpy datetime64 type could take three types for it's
> constructor: another datetime64, python datetime, and The standard
> unambiguous datetime string.  I should be able to use these interchangeably
> in numpy.  The same would be true for timedelta64.
> >
> > Easy interchange between pyth
> >
> > Ben Walsh actually implemented this and the code is in a pull request
> here:
> > https://github.com/numpy/numpy/pull/111
> > This didn't go in, because the datetime properties don't exist on the
> arrays after you convert them to datetime64, so there could be some
> unintuitive consequences from that. When Martin implemented the quaternion
> dtype, we discussed the possibility that dtypes could expose properties
> that show up on the array object, and if this were implemented I think the
> conversion and compatibility between python datetime and datetime64 could
> be made quite natural.
> > -Mark
> >
>
> Actually, at first glance, I don't see why this shouldn't go ahead as-is.
>  If I know I am getting datetime64, then I should expect to lose the
> features of the datetime object, right.  Sure, it would be nice if it kept
> those attributes, but keeping them would provide an inconsistent interface
> in the case of a numpy array created from datetime objects and one created
> from datetime64 objects (unless I misunderstood)
>
> I will read through the pull request more closely and comment further.
>
> Ben Root
>

Ok, I did some more testing between the master branch and the pull
request.  I suspect that something is interfering with the type conversion
because walshb's branch pulled on top of the current master yields the same
results as for the current master (see next).

If passed a datetime, date, time or timedelta object ""without specifying
the dtype"", you will get object arrays, which will, of course allow one to
access attributes such as .year, .month, etc.

>>> np.array([date(2000, 1, 1)])
array([2000-01-01], dtype=object)

If passed a date object with dtype='M8', or a timedelta object with
dtype='m8', you will get a datetime64 (or timedelta64):

>>> np.array([date(2000, 1, 1)], dtype='M8')
array(['2000-01-01'], dtype='datetime64[D]')

>>> np.array([timedelta(0, 0, 0)], dtype='m8')
array([0], dtype='timedelta64[us]')

The exception noted before only happens when a datetime object is passed
in.  As an additional note, a time object passed in with dtype 'M8' will
throw a ValueError because of the decision not to support times that are
without dates.  Personally, I wonder if this should instead be treated like
a timedelta64 object, but I haven't thought through the consequences of
that yet.

I should also note a slight difference between the results from master and
from v1.6.1.  In v1.6.1, creating an array with datetime objects and
dtype='M8' works:

>>> np.array([datetime(2000, 1, 1)], dtype='M8')
array([2000-01-01 00:00:00], dtype=datetime64[us])

and for passing in a date object, the dtype is named something slightly
different (and the string repr is different):

>>>  np.array([date(2000, 1, 1)], dtype='M8')
array([2000-01-01 00:00:00], dtype=datetime64[us])

The above has a dtype of 'datetime64[us]' instead of the current
'datetime64[D]', and it displays the time part, which is not currently done
(but that is likely due to the '[D]' part of the datetime).

So, where does that leave us?  Well, I do agree that there is likely a
problem with possible existing code that expects to create an object
array.  Maybe an implicit conversion should be held off until version 2.0?
Until then, I would be happy with better documentation of the current
abilities.  The datetime64 page currently only shows how to make a
datetime64 array using strings, implying that that is the only method.
Maybe the top of that page should have a section showing how to create a
datetime64 (and timedelta64) array using both string and datetime
(timedelta) data sources.  It should also mention the need for providing
the dtype (and possibly noting that future releases may not have that
requirement?).

Cheers!
Ben Root

P.S. - the need for linspace has come up for me multiple times.  I might
try putting something together.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120215/6a32c113/attachment.html 


More information about the NumPy-Discussion mailing list