[Numpy-discussion] indexed arrays ignoring duplicates

josef.pktd@gmai... josef.pktd@gmai...
Wed Sep 29 23:45:14 CDT 2010


On Thu, Sep 30, 2010 at 12:07 AM,  <josef.pktd@gmail.com> wrote:
> On Wed, Sep 29, 2010 at 11:24 PM, Damien Morton <dmorton@bitfurnace.com> wrote:
>> On Thu, Sep 30, 2010 at 11:11 AM,  <josef.pktd@gmail.com> wrote:
>>>> bincount only works for gathering/accumulating scalars. Even the
>>>> 'weights' parameter is limited to scalars.
>>>
>>> Do you mean that bincount only works with 1d arrays? I also think that
>>> this is a major limitation of it.
>>
>>>>> from numpy import *
>>>>> a = array((1,2,2,3,3))
>>>>> w = array(((1,2),(3,4),(5,6),(7,8),(9,10)))
>>>>> bincount(a,weights=w)
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>> ValueError: object too deep for desired array
>>>>> w0 = array((1,2,3,4,5))
>>>>> bincount(a,weights=w0)
>> array([ 0.,  1.,  5.,  9.])
>
> Since I'm not a C person to change bincount, how about
>
>>>> a = np.array((1,2,2,3,3))
>>>> w = np.array(((1,2),(3,4),(5,6),(7,8),(9,10)))
>>>> a2 = np.array((1,2,2,3,3))[:,None]-1 + np.array([0, a.max()])
>>>> a
> array([1, 2, 2, 3, 3])
>>>> w
> array([[ 1,  2],
>       [ 3,  4],
>       [ 5,  6],
>       [ 7,  8],
>       [ 9, 10]])
>>>> np.bincount(a2.ravel(),weights=w.ravel()).reshape(2,-1).T
> array([[  1.,   2.],
>       [  8.,  10.],
>       [ 16.,  18.]])
>
> I never thought of doing this before and I have been using bincount
> for some time.

for future search, this seems to work

>>> w = np.arange(5*4).reshape(5,4)
>>> a = np.random.randint(5,8, size=5)
>>> a
array([6, 5, 6, 5, 7])
>>> w
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

>>> a2 = a[:,None]-a.min() + (a.ptp()+1) * np.arange(w.shape[1])
>>> np.bincount(a2.ravel(),weights=w.ravel()).reshape(w.shape[1],-1).T
array([[ 16.,  18.,  20.,  22.],
       [  8.,  10.,  12.,  14.],
       [ 16.,  17.,  18.,  19.]])

will include row of zeros for indices between a.min and a.max that
have zero count

Josef

>
>
>>
>>>> I propose the name 'gather()' for the helper function that does this.
>>>
>>> I don't think "gather" is an obvious name to search for.
>>
>> "gather" is the name that the GPGPU community uses to describe this
>> kind of operation. Not just for summation but for any kind of indexed
>> reducing operation.
>
> Some group functions that Travis is planning, might go in this direction.
>
> Josef
>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>


More information about the NumPy-Discussion mailing list