[Numpy-discussion] NumPy date/time types and the resolution concept
Mon Jul 14 11:50:21 CDT 2008
A Monday 14 July 2008, Pierre GM escrigué:
> On Monday 14 July 2008 09:07:47 Francesc Alted wrote:
> > The advantage of this abstraction is that the user can easily
> > choose the scale of resolution that better fits his need. I'm
> > thinking in providing the next resolutions:
> > ["femtosec", "picosec", "nanosec", "microsec", "millisec", "sec",
> > "min", "hour", "month", "year"]
> In TimeSeries, we don't have anything less than a second, but we
> have 'daily', 'business daily', 'weekly' and 'quarterly' resolutions.
Yes, I forgot the "day" resolution. I suppose that "weekly"
and "quaterly" could be added too. However, if we adopt a new way to
specify the resolution (see later), these can be stated as '7d'
and '3m' respectively. Mmh, not sure about "business daily"; this
maybe is useful in time series, but I don't find a reasonable meaning
for it as a 'time resolution' (which is a different concept from 'time
frequency'). So I'd let it out.
> A very useful point that Matt Knox had coded is the possibility to
> specify starting points for switching from one resolution to another.
> For example, you can have a series with a 'ANN_MAR' frequency, that
> corresponds to 1 point a year, the year starting in April. When
> switching back to a monthly resolution, the points from January to
> March of the first year will be masked.
Ok. Ann was also suggesting that the origin of time would be
configurable, but then, you are talking about *masking* values. Mmm, I
don't think we should try to incorporate masking capabilities in the
NumPy date/time types.
At any rate, I've not thought about the possibility of having an origin
defined by the user, but if we could add the 'resolution' metainfo, I
don't see why we couldn't do the same with the 'origin' metainfo too.
> Another useful point would be allow the user to define his/her own
> resolution (every 15min, every 12h...). Right now it's a bit clunky
> in TimeSeries, we have to use the lowest resolution of the series
> (min, hour) and leave a lot of blanks (TimeSeries don't have to be
> regularly spaced, but it helps...)
Ok. I see the use case for this, but for implementation purposes, we
should come with a more complete way to specify the resolution than I
realized before. Hmm, what about the next:
where ``timeunit`` can take the values in:
['y', 'm', 'd', 'h', 'm', 's', 'ms', 'us', 'ns', 'fs']
so, for example, '14d' means a resolution of 14 days, or '10ms' means a
resolution of 1 hundreth of second. Sounds good to me. What other
> > Now, it comes the tricky part: how to integrate the notion
> > of 'resolution' with the 'dtype' data type factory of NumPy?
> In TimeSeries, the frequency is stored as an integer. For example, a
> daily frequency is stored as 6000, an annual frequency as 1000, a
> 'ANN_MAR' frequency as 1003...
Well, I initially planned to keep the resolution as an enumerated (int8
would be enough), but if the new way to specify resolutions goes ahead,
I'm afraid that we may need a fill int64 to save this. But apart from
that, this should be not a problem (in general, the metainfo is a very
tiny part of the space taken by a dataset).
More information about the Numpy-discussion