[Numpy-discussion] The date/time dtype and the casting issue
Wed Jul 30 05:35:32 CDT 2008
A Wednesday 30 July 2008, Ivan Vilata i Balaguer escrigué:
> Pierre GM (el 2008-07-29 a les 15:47:52 -0400) va dir::
> > On Tuesday 29 July 2008 15:14:13 Ivan Vilata i Balaguer wrote:
> > > Pierre GM (el 2008-07-29 a les 12:38:19 -0400) va dir::
> > > > > Relative time versus relative time
> > > > > ----------------------------------
> > > > >
> > > > > This case would be the same than the previous one (absolute
> > > > > vs absolute). Our proposal is to forbid this operation if
> > > > > the time units of the operands are different.
> > > >
> > > > Mmh, less sure on this one. Can't we use a hierarchy of time
> > > > units, and force to the lowest ?
> > > >
> > > > For example:
> > > > >>>numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3,
> > > > >>> dtype="t8[M]") array([15,15,15], dtype="t8['M']")
> > > >
> > > > I agree that adding ns to years makes no sense, but ns to s ?
> > > > min to hr or days ? In short: systematically raising an
> > > > exception looks a bit too drastic. There are some simple
> > > > unambiguous cases that sould be allowed (Y+M, Y+Q, M+Q, H+D...)
> > >
> > > Do you mean using the most precise unit for operations with "near
> > > enough", different units? I see the point, but what makes me
> > > doubt about it is giving the user the false impression that the
> > > most precise unit is *always* expected. I'd rather spare the
> > > user as many surprises as possible, by simplifying rules in
> > > favour of explicitness (but that may be debated).
> > Let me rephrase:
> > Adding different relative time units should be allowed when there's
> > no ambiguity on the output:
> > For example, a relative year timedelta is always 12 month
> > timedeltas, or 4
> > quarter timedeltas. In that case, I should be able to do:
> > >>>numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[M]")
> > array([15,15,15], dtype="t8['M']")
> > >>>numpy.ones(3, dtype="t8[Y]") + 3*numpy.ones(3, dtype="t8[Q]")
> > array([7,7,7], dtype="t8['Q']")
> > Similarly:
> > * an hour is always 3600s, so I could add relative s/ms/us/ns
> > timedeltas to hour timedeltas, and get the result in s/ms/us/ns.
> > * A day is always 24h, so I could add relative hours and days
> > timedeltas and get an hour timedelta
> > * A week is always 7d, so W+D -> D
> > However:
> > * We can't tell beforehand how much days are in any month, so
> > adding relative days and months would raise an exception.
> > * Same thing with weeks and months/quarters/years
> > There'll be only a limited number of time units, therefore a
> > limited number of potential combinations between time units. It'd
> > be just a matter of listing which ones are allowed and which ones
> > will raise an exception.
> That's "keep the precision" over "keep the range". At first Francesc
> and I opted for "keep the range" because that's what NumPy does, e.g.
> when operating an int64 with an uint64. Then, since we weren't sure
> about what the best choice would be for the majority of users, we
> decided upon letting (or forcing) the user to be explicit. However,
> the use of time units and integer values is precisely intended to
> "keep the precision", and overflow won't be so frequent given the
> correct time unit and the span of uint64, so you may be right in the
> end. :)
Well, I do think that the "keep the precision" rule can be a quite
sensible approach for this case, so I am in favor to it. Also, the
Pierre suggestion of allowing automatic castings for all the time units
except when the 'Y'ear and 'M'onth are involved makes a lot of sense
too. I'll adopt these for the third version of the proposal then.
> > > > > Note: we refused to use the ``.astype()`` method because of
> > > > > the additional 'time_reference' parameter that will sound
> > > > > strange for other typical uses of ``.astype()``.
> > > >
> > > > A method would be really, really helpful, though...
> > > > [...]
> > >
> > > Yay, but what doesn't seem to fit for me is that the method would
> > > only have sense to time values.
> > Well, what about a .tounit(new_unit, reference=None) ?
> > By default, the reference would be None and default to the POSIX
> > epoch. We could also go for .totunit (for to time unit)
> Yes, that'd be the signature for a method. The ``reference``
> argument shouldn't be allowed for ``datetime64`` values (absolute
> times, no ambiguities) but it should be mandatory for ``timedelta64``
> ones. Sorry, but I can't see the use of having a default reference,
> unless one wanted to work with Epoch-based deltas, which looks like
> an extremely particular case. Could you please show me a use case
> for having a reference defaulting to the POSIX epoch?
Yeah, I agree with Ivan in that a default reference time makes little
sense for general relative times. IMO, and provided that we will be
allowing an implicit casting for most of time units for relative vs
relative and in absolute vs relative, the use of forced casting will
not be as frequent, and that a function would be enough. Having said
that, I still see the merit of method for some situations, so I'll
mention that in the third proposal as a possible improvement.
More information about the Numpy-discussion