[Numpy-discussion] Find indices of largest elements
Keith Goodman
kwgoodman@gmail....
Wed Apr 14 16:03:16 CDT 2010
On Wed, Apr 14, 2010 at 1:56 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 12:39 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>> Keith Goodman <kwgoodman@gmail.com> writes:
>>> On Wed, Apr 14, 2010 at 8:49 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
>>>> On Wed, Apr 14, 2010 at 8:16 AM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>>>>> Hello,
>>>>>
>>>>> How do I best find out the indices of the largest x elements in an
>>>>> array?
>>>>>
>>>>> Example:
>>>>>
>>>>> a = [ [1,8,2], [2,1,3] ]
>>>>> magic_function(a, 2) == [ (0,1), (1,2) ]
>>>>>
>>>>> Since the largest 2 elements are at positions (0,1) and (1,2).
>>>>
>>>> Here's a quick way to rank the data if there are no ties and no NaNs:
>>>
>>> ...or if you need the indices in order:
>>>
>>>>> shape = (3,2)
>>>>> x = np.random.rand(*shape)
>>>>> x
>>> array([[ 0.52420123, 0.43231286],
>>> [ 0.97995333, 0.87416228],
>>> [ 0.71604075, 0.66018382]])
>>>>> r = x.reshape(-1).argsort().argsort()
>>
>> I don't understand why this works. Why do you call argsort() twice?
>> Doesn't that give you the indices of the sorted indices?
>
> It is confusing. Let's look at an example:
>
>>> x = np.random.rand(4)
>>> x
> array([ 0.37412289, 0.68248559, 0.12935131, 0.42510212])
>
> If we call argsort once we get the index that will sort x:
>
>>> idx = x.argsort()
>>> idx
> array([2, 0, 3, 1])
>>> x[idx]
> array([ 0.12935131, 0.37412289, 0.42510212, 0.68248559])
>
> Notice that the first element of idx is 2. That's because element x[2]
> is the min of x. But that's not what we want. We want the first
> element to be the rank of the first element of x. So we need to
> shuffle idx around so that the order aligns with x. How do we do that?
> We sort it!
>
>>> idx.argsort()
> array([1, 3, 0, 2])
>
> The min value of x is x[2], that's why 2 is the first element of idx
> which means that we want ranked(x) to contain a 0 at position 2 which
> it does.
>
> Bah, it's all magic.
You can also use rankdata from scipy:
>> from scipy.stats import rankdata
>> rankdata(x)
array([ 2., 4., 1., 3.])
Note the the smallest rank is 1.
>> rankdata(x) - 1
array([ 1., 3., 0., 2.])
More information about the NumPy-Discussion
mailing list