[Numpy-discussion] Proposal for new ufunc functionality
josef.pktd@gmai...
josef.pktd@gmai...
Sat Apr 10 12:49:37 CDT 2010
On Sat, Apr 10, 2010 at 1:23 PM, Travis Oliphant <oliphant@enthought.com> wrote:
>
> Hi,
>
> I've been mulling over a couple of ideas for new ufunc methods plus a
> couple of numpy functions that I think will help implement group-by
> operations with NumPy arrays.
>
> I wanted to discuss them on this list before putting forward an actual
> proposal or patch to get input from others.
>
> The group-by operation is very common in relational algebra and NumPy
> arrays (especially structured arrays) can often be seen as a database
> table. There are common and easy-to implement approaches for select
> and other relational algebra concepts, but group-by basically has to
> be implemented yourself.
>
> Here are my suggested additions to NumPy:
>
> ufunc methods:
> * reduceby (array, by, sorted=1, axis=0)
>
> array is the array to reduce
> by is the array to provide the grouping (can be a structured
> array or a list of arrays)
>
> if sorted is 1, then possibly a faster algorithm can be
> used.
how is the grouping in "by" specified?
These functions would be very useful for statistics. One problem with
the current bincount is that it doesn't allow multi-dimensional weight
arrays (with axis argument).
Josef
>
> * reducein (array, indices, axis=0)
>
> similar to reduce-at, but the indices provide both the
> start and end points (rather than being fence-posts like reduceat).
>
> numpy functions (or methods):
>
> * segment(array)
>
> (produce an array of integers from an array producing the
> different "regions" of an array:
>
> segment([10,20,10,20,30,30,10]) would produce ([0,1,0,1,2,2,0])
>
>
> * edges(array, at=True)
>
> produce an index array providing the edges (with either fence-post
> like syntax for reduce-at or both boundaries like reducein.
>
>
> Thoughts?
>
> -Travis
>
>
>
>
>
>
> Thoughts on the general idea?
>
>
> --
> Travis Oliphant
> Enthought Inc.
> 1-512-536-1057
> http://www.enthought.com
> oliphant@enthought.com
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list