[Numpy-discussion] fixing up datetime

Mark Wiebe mwwiebe@gmail....
Tue Jun 7 19:10:03 CDT 2011


On Tue, Jun 7, 2011 at 6:53 PM, Pierre GM <pgmdevlist@gmail.com> wrote:

>
> On Jun 8, 2011, at 1:16 AM, Mark Wiebe wrote:
>
> > Hi Dave,
> >
> > Thanks for all the feedback on the datetime, it's very useful to help
> understand the timeseries ideas, in particular with the many examples you're
> sprinkling in.
> >
> > One overall impression I have about timeseries in general is the use of
> the term "frequency" synonymously with the time unit. To me, a frequency is
> a numerical quantity with a unit of 1/(time unit), so while it's related to
> the time unit, naming it the same is something the specific timeseries
> domain has chosen to do, I think the numpy datetime class shouldn't have
> anything called "frequency" in it, and I would like to remove the current
> usage of that terminology from the codebase.
>
> True. We rather abused the term in scikits.timeseries, but we meant it as
> "given time unit".
> Matt came with the idea of representing a series of consecutive dates as an
> array of consecutive integers. The conversion integer<>datetime is done
> internally with an epoch and a unit. Initially, we called this latter
> frequency, but in the experimental git version I switched to unit. Anyhow,
> each time yo read 'frequency' in scikits.timeseries, think 'unit'.


Sounds good.

> In Wes's comment, he said
> >
> > I'm hopeful that the datetime64 dtype will enable scikits.timeseries
> > and pandas to consolidate much ofir the datetime / frequency code.
> > scikits.timeseries has a ton of great stuff for generating dates with
> > all the standard fixed frequencies.
> >
> > implying to me that the important functionality needed in time series is
> the ability to generate arrays of dates in specific ways. I suspect equating
> the specification of the array of dates and the unit of precision used to
> store the date isn't good for either the datetime functionality or
> supporting timeseries, and I'm presently trying to understand what it is
> that timeseries uses.
>
> You want a series of 365 consecutive days from today ? 'now' +
> np.arange(365). This kind of stuff.
>

This one works:

>>> np.datetime64('today') + np.arange(365)
array(['2011-06-07', '2011-06-08', '2011-06-09', '2011-06-10',
       '2011-06-11', '2011-06-12', '2011-06-13', '2011-06-14',
       '2011-06-15', '2011-06-16', '2011-06-17', '2011-06-18',
       '2011-06-19', '2011-06-20', '2011-06-21', '2011-06-22',
       '2011-06-23', '2011-06-24', '2011-06-25', '2011-06-26',
       '2011-06-27', '2011-06-28', '2011-06-29', '2011-06-30',
       '2011-07-01', '2011-07-02', '2011-07-03', '2011-07-04',
       '2011-07-05', '2011-07-06', '2011-07-07', '2011-07-08',
       '2011-07-09', '2011-07-10', '2011-07-11', '2011-07-12',
       '2011-07-13', '2011-07-14', '2011-07-15', '2011-07-16',
       '2011-07-17', '2011-07-18', '2011-07-19', '2011-07-20',
<snip>
       '2012-05-28', '2012-05-29', '2012-05-30', '2012-05-31',
       '2012-06-01', '2012-06-02', '2012-06-03', '2012-06-04',
'2012-06-05'], dtype='datetime64[D]')
>>>



> > On Tue, Jun 7, 2011 at 7:34 AM, Dave Hirschfeld <
> dave.hirschfeld@gmail.com> wrote:
> >
> > I think some of the complexity is coming from the definition of the
> timedelta.
> > In the timeseries package each date simply represents the number of
> periods
> > since the epoch and the difference between dates is therefore just and
> integer
> > with no attached metadata - its meaning is determined by the context it's
> used
> > in. e.g.
>
> Exactly that.
>
> > timeseries gets on just fine without a timedelta type - a timedelta is
> just an
> > integer and if you add an integer to a date it's interpreted as the
> number of
> > periods of that dates frequency. From a useability point of view M1 + 1
> is
> > much nicer than having to do something like M1 + ts.TimeDelta(M1.freq,
> 1).
>
> Likewise, the difference between two dates is just an integer.
>
> [Mark]
> > I think the timedelta is important, especially with the large number of
> units NumPy's datetime supports. When you're subtracting two nanosecond
> datetimes and two minute datetimes in the same code, having the units there
> to avoid confusion is pretty useful.
>
> Indeed.
>
> >  I don't envision 'asfreq' being a datetime function, this is the kind of
> thing that would layer on top in a specialized timeseries library. The
> behavior of timedelta follows a more physics-like idea with regard to the
> time unit, and I don't think something more complicated belongs at the
> bottom layer that is shared among all datetime uses.
>
> 'asfreq' converts from one unit to another (there's another function,
> convert, that does not quite exactly the same thing, but I won't get into
> details here). You'll probably have to take unit conversion into account if
> you allow the .view() or .astype() methods on your np.datetime array...
>

It supports .astype(), with a truncation policy. This is motivated partially
because that's how Pythons integer division works, and partially because if
you consider a full datetime '2011-03-14T13:22:16', it's natural to think of
the year as '2011', the date as '2011-03-14', etc, which is truncation. With
regards to converting in the other direction, you can think of a datetime as
representing a single moment in time, regardless of its unit of precision,
and equate '2011' with '2011-01', etc.

> In [80]: ts.Date('S', (_64.value + _65.value)//2)
> > Out[80]: <S : 02-Jul-2011 12:00:00>
> >
> > Adding dates definitely doesn't work, because datetimes have no zero, but
> I would express it like this:
>
> Well, it can be argued that the epoch is 0... But in scikits.timeseries,
> keep in mind that underneath, a DateArray is just an array of integer.
>

Yeah, that's the implementation, but letting the abstraction leak doesn't
provide a real benefit I can see.

[Dave]
> > I really like the idea of being able to specify multiples of the base
> frequency
> > - e.g. [7D] is equivalenty to [W] not the least because it provides an
> easy
> > way to specify quarters [3M] or seasons [6M] which are important in my
> work.
> > NB: I also deal with half-hourly and quarter-hourly timeseries and I'm
> sure
> > there are many other example which are all made possible by allowing
> > multipliers.
>
> Well, the experimental version kinda allowed that...
>
> >
> > This is one of the things where I think mixing the datetime storage
> precision with timeseries frequency seems counterproductive. Having
> different origins for datetime64 starting on different weekdays near
> 1970-01-01 doesn't seem like the right way to tackle the problem to me. I
> see other valid reasons for reintroducing the origin metadata, but this one
> I don't really like.
>
> We needed the concept to convert time series, for example from monthly to
> quarterly (what is the first month of the year (as in succession of 12
> months) you want to start with ?)


Does that need to be in the underlying datetime for layering a good
timeseries implementation on top?

-Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110607/2811ff9e/attachment.html 


More information about the NumPy-Discussion mailing list