[Numpy-discussion] code review for datetime arange

Bruce Southey bsouthey@gmail....
Fri Jun 10 10:03:16 CDT 2011

On 06/10/2011 09:18 AM, Mark Wiebe wrote:
> On Fri, Jun 10, 2011 at 12:56 AM, Ralf Gommers 
> <ralf.gommers@googlemail.com <mailto:ralf.gommers@googlemail.com>> wrote:
>     On Fri, Jun 10, 2011 at 1:54 AM, Mark Wiebe <mwwiebe@gmail.com
>     <mailto:mwwiebe@gmail.com>> wrote:
>         On Thu, Jun 9, 2011 at 5:21 PM, Ralf Gommers
>         <ralf.gommers@googlemail.com
>         <mailto:ralf.gommers@googlemail.com>> wrote:
>             On Thu, Jun 9, 2011 at 11:54 PM, Mark Wiebe
>             <mwwiebe@gmail.com <mailto:mwwiebe@gmail.com>> wrote:
>                 On Thu, Jun 9, 2011 at 4:27 PM, Ralf Gommers
>                 <ralf.gommers@googlemail.com
>                 <mailto:ralf.gommers@googlemail.com>> wrote:
>                     On Thu, Jun 9, 2011 at 10:58 PM, Mark Wiebe
>                     <mwwiebe@gmail.com <mailto:mwwiebe@gmail.com>> wrote:
>                         On Thu, Jun 9, 2011 at 3:41 PM, Christopher
>                         Barker <Chris.Barker@noaa.gov
>                         <mailto:Chris.Barker@noaa.gov>> wrote:
>                     Your branch works fine for me (OS X, py2.6), no
>                     failures. Only a few deprecation warnings like:
>                     /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py:336:
>                     DeprecationWarning: DType strings 'O4' and 'O8'
>                     are deprecated because they are platform specific.
>                     Use 'O' instead
>                       callableObj(*args, **kwargs)
>                 It looks like there are some '|O4' dtypes in
>                 'lib/tests/test_format.py', testing the .npy file
>                 format. I'm not sure why I'm not getting this warning
>                 though.
>                             Mark Wiebe wrote:
>                             > Because of the nature of datetime and
>                             timedelta, arange has to be
>                             > slightly different than with all the
>                             other types. In particular, for
>                             > datetime the primary signature is
>                             np.arange(datetime, datetime, timedelta).
>                             >
>                             > I've implemented a simple extension
>                             which allows for another way to
>                             > specify a date range, as
>                             np.arange(datetime, timedelta, timedelta).
>                     Did you think about how to document which of these
>                     basic functions work with datetime? I don't think
>                     that belongs in the docstrings, but it may then be
>                     hard for the user to figure out which functions
>                     accept datetimes. And there will be no usage
>                     examples in the docstrings.
>                  I think documenting it in a 'datetime' section of the
>                 arange documentation would be reasonable. The main
>                 datetime documentation page would also mention the
>                 functions that are most useful.
>                     Besides docs, I am not sure about your choice to
>                     modify functions like arange instead of writing a
>                     module of wrapper functions for them that know
>                     what to do with the dtype. If you have a module
>                     you can group all relevant functions, so they're
>                     easy to find. Plus it's more future-proof - if at
>                     some point numpy grows another new dtype, just
>                     create a new module with wrapper funcs for that dtype.
>                 The facts that datetime and timedelta are related in a
>                 particular way different from other data types, and
>                 that they are parameterized types, both contribute to
>                 them not fitting naturally the current structure of
>                 NumPy. I'm not sure I understand the module idea,
>             Basically, use np.datetime.arange which understand the
>             dtype, then calls np.arange under the hood. Or is just its
>             own function, like the dtrange() Robert just suggested.
>             It's pretty much the same as for the ma module, which
>             reimplements or wraps many numpy functions that do not
>             understand masked arrays.
>         I'm not a big fan of the way the ma module works, it doesn't
>         integrate naturally and orthogonally with all the other
>         features of NumPy. It's also an array subtype, quite different
>         from a dtype. We don't have np.bool.arange, np.int8.arange,
>         etc, and the abstraction used by arange built into the custom
>         data type mechanism is too weak too support the needs of datetime.
>         I'd like to use the requirements of datetime as a guide
>         to molding the future design of the data type system, and if
>         we make datetime a second-class citizen because it doesn't
>         behave like a float, we're not going to be able to discover
>         the possibilities.
>                 I would rather think that since it's a built-in NumPy
>                 data type, it should work with the regular NumPy
>                 functions wherever that makes sense.
>             That doesn't make sense to me. Being a dtype that happens
>             to be shipped with numpy doesn't make it more special than
>             other dtypes.
>         This isn't making it more special, it's just conforming the
>         natural NumPy way to how datetime/timedelta operates.
>     Maybe I'm misunderstanding this, and once you make a function work
>     for datetime it would also work for other new dtypes. But my
>     impression is that that's not the case. Let's say I make a new
>     dtype with distance instead of time attached. Would I be able to
>     use it with arange, or would I have to go in and change the arange
>     implementation again to support it?
> Ok, I think I understand the point you're driving at now. We need 
> NumPy to have the flexibility so that external plugins defining custom 
> data types can do the same thing that datetime does, having datetime 
> be special compared to those is undesirable. This I wholeheartedly 
> agree with, and the way I'm coding datetime is driving in that direction.
> The state of the NumPy codebase, however, prevents jumping straight to 
> such a solution, since there are several mechanisms and layers of such 
> abstraction already which themselves do not satisfy the needs of 
> datetime and other similar types. Also, arange is just one of a large 
> number of functions which could be extended individually for different 
> types, for example if one wanted to make a unit quaternion data type 
> for manipulating rotations, its needs would be significantly 
> different. Because I can't see the big picture from within the world 
> of datetime, and because such generalization takes a great amount of 
> effort, I'm instead making these changes minimally invasive on the 
> current codebase.
> This approach is along the lines of "lifting" in generic programming. 
> First you write your algorithm in the specific domain, and work out 
> how it behaves there. Then, you determine what are the minimal 
> requirements of the types involved, and abstract the algorithm 
> appropriately. Jumping to the second step directly is generally too 
> difficult, an incremental path to the final goal must be used.
> http://www.generic-programming.org/about/intro/lifting.php
> Cheers,
> Mark
>     Ralf
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
I have following the multiple date/time discussions with some interest 
as it is clear there is not 'one way' (perhaps it's Dutch). But, I do 
keep coming back to Chris's concepts of time as a strict unit of measure 
and time as a calender.  So I do think that types of changes are rather 
premature without defining a some base measurement of time - probably 
some thing like Unix time or International Atomic Time (TAI) but not UTC 
due to leap seconds (http://en.wikipedia.org/wiki/Leap_second).

Leap seconds make using UTC rather problematic for a couple of reasons:
1) It's essentially only historical. A range of the seconds in December 
2011 computed 'now' in June 2011 using UTC might be different than a 
range calculated in a couple weeks if leaps seconds are added to 
December 2011.
2) There is also the issue that 23:59:60 December 31, 2008 UTC is a 
valid time but not for other years like 2009 and 2010. It also means 
that you have to be careful of doing experiments that require accuracy 
of seconds or less because a 1 second gap could be recorded as a 2 
second gap.

The other issue is how do you define the np.arange step argument since 
that can be in different scales such as month, years, seconds? Can a 
user specific days and get half-days (like 1.5 days) or must these be 
'integer' days?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110610/8172b984/attachment-0001.html 

More information about the NumPy-Discussion mailing list