[Numpy-discussion] Proposal for new ufunc functionality
Mon Apr 12 17:31:16 CDT 2010
On Mon, Apr 12, 2010 at 17:26, Travis Oliphant <email@example.com> wrote:
> On Apr 11, 2010, at 2:56 PM, Anne Archibald wrote:
> 2010/4/10 Stéfan van der Walt <firstname.lastname@example.org>:
> On 10 April 2010 19:45, Pauli Virtanen <email@example.com> wrote:
> Another addition to ufuncs that should be though about is specifying the
> Python-side interface to generalized ufuncs.
> This is an interesting idea; what do you have in mind?
> I can see two different kinds of answer to this question: one is a
> tool like vectorize/frompyfunc that allows construction of generalized
> ufuncs from python functions, and the other is thinking out what
> methods and support functions generalized ufuncs need.
> The former would be very handy for prototyping gufunc-based libraries
> before delving into the templated C required to make them actually
> The latter is more essential in the long run: it'd be nice to have a
> reduce-like function, but obviously only when the arity and dimensions
> work out right (which I think means (shape1,shape2)->(shape2) ). This
> could be applied along an axis or over a whole array. reduceat and the
> other, more sophisticated, schemes might also be worth supporting. At
> a more elementary level, gufunc objects should have good introspection
> - docstrings, shape specification accessible from python, named formal
> arguments, et cetera. (So should ufuncs, for that matter.)
> We should collect all of these proposals into a NEP. To clarify what I
> mean by "group-by" behavior.
> Suppose I have an array of floats and an array of integers. Each element
> in the array of integers represents a region in the float array of a certain
> "kind". The reduction should take place over like-kind values:
> add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
> results in the calculations:
> 1 + 3 + 6 + 7
> 2 + 4
> 5 + 8 + 9
> and therefore the output (notice the two arrays --- perhaps a structured
> array should be returned instead...)
> [17, 6, 22]
> The real value is when you have tabular data and you want to do reductions
> in one field based on values in another field. This happens all the time
> in relational algebra and would be a relatively straightforward thing to
> support in ufuncs.
I might suggest a simplification where the by array must be an array
of non-negative ints such that they are indices into the output. For
example (note that I replace 2 with 3 and have no 2s in the by array):
add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,3,0,0,3,3]) ==
[17, 6, 0, 22]
This basically generalizes bincount() to other binary ufuncs.
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion