[Numpy-discussion] suggestion for generalizing numpy functions

Darren Dale dsdale24@gmail....
Wed May 27 10:30:52 CDT 2009


Now that numpy-1.3 has been released, I was hoping I could engage the numpy
developers and community concerning my suggestion to improve the ufunc
wrapping mechanism. Currently, ufuncs call, on the way out, the
__array_wrap__ method of the input array with the highest
__array_priority__.

There are use cases, like masked arrays or arrays with units, where it is
imperative to run some code on the way in to the ufunc as well. MaskedArrays
do this by reimplementing or wrapping ufuncs, but this approach puts some
pretty severe constraints on subclassing. For example, in my Quantities
package I have a Quantity object that derives from ndarray. It has been
suggested that in order to make ufuncs work with Quantity, I should wrap
numpy's built-in ufuncs. But I intend to make a MaskedQuantity object as
well, deriving from MaskedArray, and would therefore have to wrap the
MaskedArray ufuncs as well.

If ufuncs would simply call a method both on the way in and on the way out,
I think this would go a long way to improving this situation. I whipped up a
simple proof of concept and posted it in this thread a while back. For
example, a MaskedQuantity would implement a method like __gfunc_pre__ to
check the validity of the units operation etc, and would then call
MaskedArray.__gfunc_pre__ (if defined) to determine the domain etc.
__gfunc_pre__ would return a dict containing any metadata the subclasses
wish to provide based on the inputs, and that dict would be passed along
with the inputs, output and context to __gfunc_post__, so postprocessing can
be done (__gfunc_post__ replacing __array_wrap__).

Of course, packages like MaskedArray may still wish to reimplement ufuncs,
like Eric Firing is investigating right now. The point is that classes that
dont care about the implementation of ufuncs, that only need to provide
metadata based on the inputs and the output, can do so using this mechanism
and can build upon other specialized arrays.

I would really appreciate input from numpy developers and other interested
parties. I would like to continue developing the Quantities package this
summer, and have been approached by numerous people interested in using
Quantities with sage, sympy, matplotlib. But I would prefer to improve the
ufunc mechanism (or establish that there is no interest among the community
to do so) so I can improve the package (or limit its scope) before making an
official announcement.

Thank you,
Darren


On Mon, Mar 9, 2009 at 5:37 PM, Darren Dale <dsdale24@gmail.com> wrote:

> On Mon, Mar 9, 2009 at 9:50 AM, Darren Dale <dsdale24@gmail.com> wrote:
>
>> I spent some time over the weekend fixing a few bugs in numpy that were
>> exposed when attempting to use ufuncs with ndarray subclasses. It got me
>> thinking that, with relatively little work, numpy's functions could be made
>> to be more general. For example, the numpy.ma module redefines many of
>> the standard ufuncs in order to do some preprocessing before the builtin
>> ufunc is called. Likewise, in the units/quantities package I have been
>> working on, I would like to perform a dimensional analysis to make sure an
>> operation is allowed before I call a ufunc that might change data in place.
>>
>> Imagine an ndarray subclass with methods like __gfunc_pre__ and
>> __gfunc_post__. __gfunc_pre__ could accept the context that is currently
>> provided to __array_wrap__ (the inputs and the function called), perform
>> whatever preprocessing is desired, and maybe return a dictionary containing
>> metadata. Numpy functions could then be wrapped with a decorator that 1)
>> calls __gfunc_pre__ and obtain any metadata that is returned 2) calls the
>> wrapped functions, and then 3) calls __gfunc_post__, which might be very
>> similar to __array_wrap__ except that it would also accept the metadata
>> created by __gfunc_pre__.
>>
>> In cases where the routines to be called by __gfunc_pre__ and _post__
>> depend on what function is called, the the subclass could implement routines
>> and store them in a dictionary-like object that is keyed using the function
>> called. I have been exploring this approach with Quantities and it seems to
>> work well. For example:
>>
>>     def __gfunc_pre__(self, gfunc, *args):
>>         try:
>>             return gfunc_pre_registry[gfunc](*args)
>>         except KeyError:
>>             return {}
>>
>> I think such an approach for generalizing numpy's functions could be
>> implemented without being disruptive to the existing __array_wrap__
>> framework. The decorator would attempt to identify an input or output array
>> to use to call __gfunc_pre__ and _post__. If it finds them, it uses them. If
>> it doesnt find them, no harm done, the existing __array_wrap__ mechanisms
>> are still in place if the wrapped function is a ufunc.
>>
>> One other nice feature: the metadata that is returned by __gfunc_pre__
>> could contain an optional flag that the decorator attempts to pass to the
>> wrapped function so that __gfunc_pre__ and _post are not called for any
>> decorated internal functions. That way the subclass could specify that
>> __gfunc_pre__ and _post should be called only for the outer-most function.
>>
>> Comments?
>>
>
> I'm attaching a proof of concept script, maybe it will better illustrate
> what I am talking about.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090527/8a9711c0/attachment-0001.html 


More information about the Numpy-discussion mailing list