[Numpy-discussion] indexed arrays ignoring duplicates
josef.pktd@gmai...
josef.pktd@gmai...
Wed Sep 29 23:45:14 CDT 2010
On Thu, Sep 30, 2010 at 12:07 AM, <josef.pktd@gmail.com> wrote:
> On Wed, Sep 29, 2010 at 11:24 PM, Damien Morton <dmorton@bitfurnace.com> wrote:
>> On Thu, Sep 30, 2010 at 11:11 AM, <josef.pktd@gmail.com> wrote:
>>>> bincount only works for gathering/accumulating scalars. Even the
>>>> 'weights' parameter is limited to scalars.
>>>
>>> Do you mean that bincount only works with 1d arrays? I also think that
>>> this is a major limitation of it.
>>
>>>>> from numpy import *
>>>>> a = array((1,2,2,3,3))
>>>>> w = array(((1,2),(3,4),(5,6),(7,8),(9,10)))
>>>>> bincount(a,weights=w)
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: object too deep for desired array
>>>>> w0 = array((1,2,3,4,5))
>>>>> bincount(a,weights=w0)
>> array([ 0., 1., 5., 9.])
>
> Since I'm not a C person to change bincount, how about
>
>>>> a = np.array((1,2,2,3,3))
>>>> w = np.array(((1,2),(3,4),(5,6),(7,8),(9,10)))
>>>> a2 = np.array((1,2,2,3,3))[:,None]-1 + np.array([0, a.max()])
>>>> a
> array([1, 2, 2, 3, 3])
>>>> w
> array([[ 1, 2],
> [ 3, 4],
> [ 5, 6],
> [ 7, 8],
> [ 9, 10]])
>>>> np.bincount(a2.ravel(),weights=w.ravel()).reshape(2,-1).T
> array([[ 1., 2.],
> [ 8., 10.],
> [ 16., 18.]])
>
> I never thought of doing this before and I have been using bincount
> for some time.
for future search, this seems to work
>>> w = np.arange(5*4).reshape(5,4)
>>> a = np.random.randint(5,8, size=5)
>>> a
array([6, 5, 6, 5, 7])
>>> w
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
>>> a2 = a[:,None]-a.min() + (a.ptp()+1) * np.arange(w.shape[1])
>>> np.bincount(a2.ravel(),weights=w.ravel()).reshape(w.shape[1],-1).T
array([[ 16., 18., 20., 22.],
[ 8., 10., 12., 14.],
[ 16., 17., 18., 19.]])
will include row of zeros for indices between a.min and a.max that
have zero count
Josef
>
>
>>
>>>> I propose the name 'gather()' for the helper function that does this.
>>>
>>> I don't think "gather" is an obvious name to search for.
>>
>> "gather" is the name that the GPGPU community uses to describe this
>> kind of operation. Not just for summation but for any kind of indexed
>> reducing operation.
>
> Some group functions that Travis is planning, might go in this direction.
>
> Josef
>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
More information about the NumPy-Discussion
mailing list