[Numpy-discussion] Find indices of largest elements
Wed Apr 14 17:12:49 CDT 2010
On 14 April 2010 16:56, Keith Goodman <firstname.lastname@example.org> wrote:
> On Wed, Apr 14, 2010 at 12:39 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>> Keith Goodman <email@example.com> writes:
>>> On Wed, Apr 14, 2010 at 8:49 AM, Keith Goodman <firstname.lastname@example.org> wrote:
>>>> On Wed, Apr 14, 2010 at 8:16 AM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>>>>> How do I best find out the indices of the largest x elements in an
>>>>> a = [ [1,8,2], [2,1,3] ]
>>>>> magic_function(a, 2) == [ (0,1), (1,2) ]
>>>>> Since the largest 2 elements are at positions (0,1) and (1,2).
>>>> Here's a quick way to rank the data if there are no ties and no NaNs:
>>> ...or if you need the indices in order:
>>>>> shape = (3,2)
>>>>> x = np.random.rand(*shape)
>>> array([[ 0.52420123, 0.43231286],
>>> [ 0.97995333, 0.87416228],
>>> [ 0.71604075, 0.66018382]])
>>>>> r = x.reshape(-1).argsort().argsort()
>> I don't understand why this works. Why do you call argsort() twice?
>> Doesn't that give you the indices of the sorted indices?
> It is confusing. Let's look at an example:
>>> x = np.random.rand(4)
> array([ 0.37412289, 0.68248559, 0.12935131, 0.42510212])
> If we call argsort once we get the index that will sort x:
>>> idx = x.argsort()
> array([2, 0, 3, 1])
> array([ 0.12935131, 0.37412289, 0.42510212, 0.68248559])
> Notice that the first element of idx is 2. That's because element x
> is the min of x. But that's not what we want. We want the first
> element to be the rank of the first element of x. So we need to
> shuffle idx around so that the order aligns with x. How do we do that?
> We sort it!
Unless I am mistaken, what you are doing here is inverting the
permutation returned by the first argsort. The second argsort is an n
log n method, though, and permutations can be inverted in linear time:
ix = np.argsort(X)
ixinv = np.zeros_like(ix)
ixinv[ix] = np.arange(len(ix))
This works because if ix is a permutation and ixinv is its inverse,
A = B[ix]
is the same as
A[ixinv] = B
This also means that you can often do without the actual inverse by
moving the indexing operation to the other side of the equal sign.
(Not in the OP's case, though.)
I'll also add that if the OP needs the top m for 1<m<<n, sorting the
whole input array is not the most efficient algorithm; there are
priority-queue-based schemes that are asymptotically more efficient,
but none exists in numpy. Since numpy's sorting is quite fast, I
personally would just use the sorting.
> array([1, 3, 0, 2])
> The min value of x is x, that's why 2 is the first element of idx
> which means that we want ranked(x) to contain a 0 at position 2 which
> it does.
> Bah, it's all magic.
> NumPy-Discussion mailing list
More information about the NumPy-Discussion