[Numpy-discussion] Proposal for new ufunc functionality

Travis Oliphant oliphant@enthought....
Tue Apr 13 09:03:51 CDT 2010


On Apr 12, 2010, at 5:31 PM, Robert Kern wrote:
>>
>> We should collect all of these proposals into a NEP.      To  
>> clarify what I
>> mean by "group-by" behavior.
>> Suppose I have an array of floats and an array of integers.   Each  
>> element
>> in the array of integers represents a region in the float array of  
>> a certain
>> "kind".   The reduction should take place over like-kind values:
>> Example:
>> add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
>> results in the calculations:
>> 1 + 3 + 6 + 7
>> 2 + 4
>> 5 + 8 + 9
>> and therefore the output (notice the two arrays --- perhaps a  
>> structured
>> array should be returned instead...)
>> [0,1,2],
>> [17, 6, 22]
>>
>> The real value is when you have tabular data and you want to do  
>> reductions
>> in one field based on values in another field.   This happens all  
>> the time
>> in relational algebra and would be a relatively straightforward  
>> thing to
>> support in ufuncs.
>
> I might suggest a simplification where the by array must be an array
> of non-negative ints such that they are indices into the output. For
> example (note that I replace 2 with 3 and have no 2s in the by array):
>
> add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,3,0,0,3,3]) ==
> [17, 6, 0, 22]
>
> This basically generalizes bincount() to other binary ufuncs.
>

Interesting proposal.   I do like the having only one output.

I'm particularly interested in reductions with "by" arrays of  
strings.  i.e.  something like:

add.reduceby([10,11,12,13,14,15,16],  
by=['red','green','red','green','red','blue', 'blue']).

resulting in:

10+12+14
11+13
15+16

In practice, these would have to be essentially mapped to the kind of  
integer array I used in the original example, and so I suppose if we  
couple your proposal with the segment function from the rest of my  
original proposal, then the same resulting functionality is available  
(with perhaps the extra intermediate integer array that may not be  
strictly necessary).

But, having simple building blocks is usually better in the long run  
(and typically leads to better optimizations by human programmers).

Thanks,

-Travis


--
Travis Oliphant
Enthought Inc.
1-512-536-1057
http://www.enthought.com
oliphant@enthought.com





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100413/af131e1c/attachment.html 


More information about the NumPy-Discussion mailing list