[Numpy-discussion] code review for datetime arange

Mark Wiebe mwwiebe@gmail....
Fri Jun 10 09:18:31 CDT 2011


On Fri, Jun 10, 2011 at 12:56 AM, Ralf Gommers
<ralf.gommers@googlemail.com>wrote:

>
>
> On Fri, Jun 10, 2011 at 1:54 AM, Mark Wiebe <mwwiebe@gmail.com> wrote:
>
>> On Thu, Jun 9, 2011 at 5:21 PM, Ralf Gommers <ralf.gommers@googlemail.com
>> > wrote:
>>
>>>
>>>
>>> On Thu, Jun 9, 2011 at 11:54 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
>>>
>>>> On Thu, Jun 9, 2011 at 4:27 PM, Ralf Gommers <
>>>> ralf.gommers@googlemail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 9, 2011 at 10:58 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
>>>>>
>>>>>> On Thu, Jun 9, 2011 at 3:41 PM, Christopher Barker <
>>>>>> Chris.Barker@noaa.gov> wrote:
>>>>>>
>>>>>> Your branch works fine for me (OS X, py2.6), no failures. Only a few
>>>>> deprecation warnings like:
>>>>> /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py:336:
>>>>> DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they
>>>>> are platform specific. Use 'O' instead
>>>>>   callableObj(*args, **kwargs)
>>>>>
>>>>
>>>> It looks like there are some '|O4' dtypes in 'lib/tests/test_format.py',
>>>> testing the .npy file format. I'm not sure why I'm not getting this warning
>>>> though.
>>>>
>>>>
>>>>>  Mark Wiebe wrote:
>>>>>>> > Because of the nature of datetime and timedelta, arange has to be
>>>>>>> > slightly different than with all the other types. In particular,
>>>>>>> for
>>>>>>> > datetime the primary signature is np.arange(datetime, datetime,
>>>>>>> timedelta).
>>>>>>> >
>>>>>>> > I've implemented a simple extension which allows for another way to
>>>>>>> > specify a date range, as np.arange(datetime, timedelta, timedelta).
>>>>>>>
>>>>>>> Did you think about how to document which of these basic functions
>>>>> work with datetime? I don't think that belongs in the docstrings, but it may
>>>>> then be hard for the user to figure out which functions accept datetimes.
>>>>> And there will be no usage examples in the docstrings.
>>>>>
>>>>
>>>>  I think documenting it in a 'datetime' section of the arange
>>>> documentation would be reasonable. The main datetime documentation page
>>>> would also mention the functions that are most useful.
>>>>
>>>> Besides docs, I am not sure about your choice to modify functions like
>>>>> arange instead of writing a module of wrapper functions for them that know
>>>>> what to do with the dtype. If you have a module you can group all relevant
>>>>> functions, so they're easy to find. Plus it's more future-proof - if at some
>>>>> point numpy grows another new dtype, just create a new module with wrapper
>>>>> funcs for that dtype.
>>>>>
>>>>
>>>> The facts that datetime and timedelta are related in a particular way
>>>> different from other data types, and that they are parameterized types, both
>>>> contribute to them not fitting naturally the current structure of NumPy. I'm
>>>> not sure I understand the module idea,
>>>>
>>>
>>> Basically, use np.datetime.arange which understand the dtype, then calls
>>> np.arange under the hood. Or is just its own function, like the dtrange()
>>> Robert just suggested. It's pretty much the same as for the ma module, which
>>> reimplements or wraps many numpy functions that do not understand masked
>>> arrays.
>>>
>>
>> I'm not a big fan of the way the ma module works, it doesn't integrate
>> naturally and orthogonally with all the other features of NumPy. It's also
>> an array subtype, quite different from a dtype. We don't have
>> np.bool.arange, np.int8.arange, etc, and the abstraction used by arange
>> built into the custom data type mechanism is too weak too support the needs
>> of datetime.
>>
>>
> I'd like to use the requirements of datetime as a guide to molding the
>> future design of the data type system, and if we make datetime a
>> second-class citizen because it doesn't behave like a float, we're not going
>> to be able to discover the possibilities.
>>
>>
>  I would rather think that since it's a built-in NumPy data type, it
>>>> should work with the regular NumPy functions wherever that makes sense.
>>>>
>>>
>>> That doesn't make sense to me. Being a dtype that happens to be shipped
>>> with numpy doesn't make it more special than other dtypes.
>>>
>>
>>  This isn't making it more special, it's just conforming the natural NumPy
>> way to how datetime/timedelta operates.
>>
>
> Maybe I'm misunderstanding this, and once you make a function work for
> datetime it would also work for other new dtypes. But my impression is that
> that's not the case. Let's say I make a new dtype with distance instead of
> time attached. Would I be able to use it with arange, or would I have to go
> in and change the arange implementation again to support it?
>

Ok, I think I understand the point you're driving at now. We need NumPy to
have the flexibility so that external plugins defining custom data types can
do the same thing that datetime does, having datetime be special compared to
those is undesirable. This I wholeheartedly agree with, and the way I'm
coding datetime is driving in that direction.

The state of the NumPy codebase, however, prevents jumping straight to such
a solution, since there are several mechanisms and layers of such
abstraction already which themselves do not satisfy the needs of datetime
and other similar types. Also, arange is just one of a large number of
functions which could be extended individually for different types, for
example if one wanted to make a unit quaternion data type for manipulating
rotations, its needs would be significantly different. Because I can't see
the big picture from within the world of datetime, and because such
generalization takes a great amount of effort, I'm instead making these
changes minimally invasive on the current codebase.

This approach is along the lines of "lifting" in generic programming. First
you write your algorithm in the specific domain, and work out how it behaves
there. Then, you determine what are the minimal requirements of the types
involved, and abstract the algorithm appropriately. Jumping to the second
step directly is generally too difficult, an incremental path to the final
goal must be used.

http://www.generic-programming.org/about/intro/lifting.php

Cheers,
Mark


>
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110610/52d85ec1/attachment-0001.html 


More information about the NumPy-Discussion mailing list