[Numpy-discussion] Generalized ufuncs?
Charles R Harris
Sun Aug 17 20:15:36 CDT 2008
On Sun, Aug 17, 2008 at 6:13 PM, Engel, Hans-Andreas <
> I am sorry that our submission
> http://projects.scipy.org/scipy/numpy/ticket/887 has created some
> annoyance; presumably we have taken the "Make contributions (e.g. code
> patches), (...) by submitting a 'ticket' on the Trac pages linked below"
> on http://scipy.org/Developer_Zone somewhat too literally.
> It was great to receive very positive responses to our patch and to
> receive a very timely review; this is encouraging for submitting code to
> the numpy repository.
> I would like to add that the proposed change is not that arbitrary; it
> is a well-established concept -- it is outlined on the
> GeneralLoopingFunctions wiki page, and it is a prominent concept of
> perl's PDL vector library. Of course, there is still room for arguing
> about details.
> The fact that no explicit "generalized" ufuncs are provided, should in
> my opinion not be an argument why not to include the change in the 1.2.0
> release. Writing extension libraries that implement such generalized
> ufuncs, while being able to use the standard numpy distribution, would
> certainly be very valuable.
> Furthermore, the risk for including the proposed patch in 1.2.0 is very
> low: existing functionality is not touched. (Except the glitch we had
> by declaring variables in a gcc-way.) For standard ufuncs, it should be
> straightforward to see that the patch does not modify the behavior at
> I wish everyone a great SciPy'08 conference and very much hope that you
> can include the proposed functionality in the forthcoming NumPy release.
For those interested in the description posted with ticket
here it is. The patch can be found with the ticket. I note that
PyUFunc_FromFuncAndDataAndSignature is not put at the end of the
ufunc_api_order.txt list and might break the current API, which has entries
#define PyUFunc_Type (*(PyTypeObject *)PyUFunc_API)
So we need to keep the PyUFunc_API entries in order.
Generalized Universal Functions
There is a general need for looping over not only functions on scalars but
also over functions on vectors (or arrays), as explained on
http://scipy.org/scipy/numpy/wiki/GeneralLoopingFunctions. We propose to
realize this concept by generalizing the universal functions (ufuncs), and
provide a C implementation that adds ~500 lines to the numpy code base. In
current (specialized) ufuncs, the elementary function is limited to
element-by-element operations, whereas the generalized version supports
"sub-array" by "sub-array" operations. The Perl vector library PDL provides
a similar functionality and its terms are re-used in the following.
Each generalized ufunc has information associated with it that states what
the "core" dimensionality of the inputs is, as well as the corresponding
dimensionality of the outputs (the element-wise ufuncs have zero core
dimensions). The list of the core dimensions for all arguments is called the
"signature" of a ufunc. For example, the ufunc numpy.add has signature
"(),()->()" defining two scalar inputs and one scalar output.
Another example is (see the
inner1d(a,b) with a signature of "(i),(i)->()". This applies the inner
product along the last axis of each input, but keeps the remaining indices
intact. For example, where a is of shape (3,5,N) and b is of shape (5,N),
this will return an output of shape (3,5). The underlying elementary
function is called 3*5 times. In the signature, we specify one core
dimension "(i)" for each input and zero core dimensions "()" for the output,
since it takes two 1-d arrays and returns a scalar. By using the same name
"i", we specify that the two corresponding dimensions should be of the same
size (or one of them is of size 1 and will be broadcasted).
The dimensions beyond the core dimensions are called "loop" dimensions. In
the above example, this corresponds to (3,5).
The usual numpy "broadcasting" rules apply, where the signature determines
how the dimensions of each input/output object are split into core and loop
1. While an input array has a smaller dimensionality than the
corresponding number of core dimensions, 1's are pre-pended to its shape.
2. The core dimensions are removed from all inputs and the remaining
dimensions are broadcasted; defining the loop dimensions.
3. The output is given by the loop dimensions plus the output core
Definitions ¶ <http://scipy.org/scipy/numpy/ticket/887#Definitions>
Each ufunc consists of an elementary function that performs the most basic
operation on the smallest portion of array arguments (e.g. adding two
numbers is the most basic operation in adding two arrays). The ufunc applies
the elementary function multiple times on different parts of the arrays. The
input/output of elementary functions can be vectors; e.g., the elementary
function of inner1d takes two vectors as input.
A signature is a string describing the input/output dimensions of the
elementary function of a ufunc. See section below for more details.
The dimensionality of each input/output of an elementary function is
defined by its core dimensions (zero core dimensions correspond to a scalar
input/output). The core dimensions are mapped to the last dimensions of the
A dimension name represents a core dimension in the signature. Different
dimensions may share a name, indicating that they are of the same size (or
A dimension index is an integer representing a dimension name. It
enumerates the dimension names according to the order of the first
occurrence of each name in the signature.
Details of Signature
The signature defines "core" dimensionality of input and output variables,
and thereby also defines the contraction of the dimensions. The signature is
represented by a string of the following format:
- Core dimensions of each input or output array are represented by a list
of dimension names in parentheses, "(i_1,...,i_N)"; a scalar input/output
is denoted by "()". Instead of "i_1", "i_2", etc, one can use any valid
Python variable name.
- Dimension lists for different arguments are separated by ",".
Input/output arguments are separated by "->".
- If one uses the same dimension name in multiple locations, this
enforces the same size (or broadcastable size) of the corresponding
The formal syntax of signatures is as follows.
<Signature> ::= <Input arguments> "->" <Output arguments>
<Input arguments> ::= <Argument list>
<Output arguments> ::= <Argument list>
<Argument list> ::= nil | <Argument> | <Argument> "," <Argument list>
<Argument> ::= "(" <Core dimension list> ")"
<Core dimension list> ::= nil | <Dimension name> |
<Dimension name> "," <Core dimension list>
<Dimension name> ::= valid Python variable name
1. All quotes are for clarity.
2. Core dimensions that share the same name must be broadcastable, as the
two "i"s in our example above. Each dimension name typically
corresponding to one level of looping in the elementary function's
3. White spaces are ignored.
Here are some examples of signatures.
dot2d "(m,n),(n,p)->(m,p)" (matrix multiplication) outer_inner
"(i,t),(j,t)->(i,j)" (inner over the last dimension, outer over the second
to last, and loop/broadcast over the rest.)
C-API for implementing Elementary Functions
The current interface remains unchanged, and PyUFunc_FromFuncAndData can
still be used to implement (specialized) ufuncs, consisting of scalar
One can use PyUFunc_FromFuncAndDataAndSignature to declare a more general
ufunc. The argument list is the same as PyUFunc_FromFuncAndData, with an
additional argument specifying the signature as C string.
Furthermore, the callback function is of the same type as before, void
(*foo)(char **args, intp *dimensions, intp *steps, void *func). When
invoked, args is a list of length nargs containing the data of all
input/output arguments. For a scalar elementary function, steps is also of
length nargs, denoting the strides used for the arguments. dimensions is a
pointer to a single integer defining the size of the axis to be looped over.
For a non-trivial signature, dimensions will also contain the sizes of the
core dimensions as well, starting at the second entry. Only one size is
provided for each unique dimension name and the sizes are given according to
the first occurrence of a dimension name in the signature.
The first nargs elements of steps remain the same as for scalar ufuncs. The
following elements contain the strides of all core dimensions for all
arguments in order.
For example, consider a ufunc with signature "(i,j),(i)->()". In this case,
args will contain three pointers to the data of the input/output arrays a, b,
c. Furthermore, dimensions will be [N, I, J] to define the size of N of the
loop and the sizes I and J for the core dimensions i and j. Finally,
stepswill be [a_N,
b_N, c_N, a_i, a_j, b_i], containing all necessary strides.
License ¶ <http://scipy.org/scipy/numpy/ticket/887#License>
The attached code and the above documentation was written by Wenjie Fu <
firstname.lastname@example.org> and Hans-Andreas Engel <email@example.com>, and is provided
under the numpy license:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion