[Numpy-discussion] Find indices of largest elements
Charles R Harris
charlesr.harris@gmail....
Wed Apr 14 22:55:16 CDT 2010
On Wed, Apr 14, 2010 at 4:39 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 3:12 PM, Anne Archibald
> <peridot.faceted@gmail.com> wrote:
> > On 14 April 2010 16:56, Keith Goodman <kwgoodman@gmail.com> wrote:
> >> On Wed, Apr 14, 2010 at 12:39 PM, Nikolaus Rath <Nikolaus@rath.org>
> wrote:
> >>> Keith Goodman <kwgoodman@gmail.com> writes:
> >>>> On Wed, Apr 14, 2010 at 8:49 AM, Keith Goodman <kwgoodman@gmail.com>
> wrote:
> >>>>> On Wed, Apr 14, 2010 at 8:16 AM, Nikolaus Rath <Nikolaus@rath.org>
> wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> How do I best find out the indices of the largest x elements in an
> >>>>>> array?
> >>>>>>
> >>>>>> Example:
> >>>>>>
> >>>>>> a = [ [1,8,2], [2,1,3] ]
> >>>>>> magic_function(a, 2) == [ (0,1), (1,2) ]
> >>>>>>
> >>>>>> Since the largest 2 elements are at positions (0,1) and (1,2).
> >>>>>
> >>>>> Here's a quick way to rank the data if there are no ties and no NaNs:
> >>>>
> >>>> ...or if you need the indices in order:
> >>>>
> >>>>>> shape = (3,2)
> >>>>>> x = np.random.rand(*shape)
> >>>>>> x
> >>>> array([[ 0.52420123, 0.43231286],
> >>>> [ 0.97995333, 0.87416228],
> >>>> [ 0.71604075, 0.66018382]])
> >>>>>> r = x.reshape(-1).argsort().argsort()
> >>>
> >>> I don't understand why this works. Why do you call argsort() twice?
> >>> Doesn't that give you the indices of the sorted indices?
> >>
> >> It is confusing. Let's look at an example:
> >>
> >>>> x = np.random.rand(4)
> >>>> x
> >> array([ 0.37412289, 0.68248559, 0.12935131, 0.42510212])
> >>
> >> If we call argsort once we get the index that will sort x:
> >>
> >>>> idx = x.argsort()
> >>>> idx
> >> array([2, 0, 3, 1])
> >>>> x[idx]
> >> array([ 0.12935131, 0.37412289, 0.42510212, 0.68248559])
> >>
> >> Notice that the first element of idx is 2. That's because element x[2]
> >> is the min of x. But that's not what we want. We want the first
> >> element to be the rank of the first element of x. So we need to
> >> shuffle idx around so that the order aligns with x. How do we do that?
> >> We sort it!
> >
> > Unless I am mistaken, what you are doing here is inverting the
> > permutation returned by the first argsort. The second argsort is an n
> > log n method, though, and permutations can be inverted in linear time:
> >
> > ix = np.argsort(X)
> > ixinv = np.zeros_like(ix)
> > ixinv[ix] = np.arange(len(ix))
> >
> > This works because if ix is a permutation and ixinv is its inverse,
> > A = B[ix]
> > is the same as
> > A[ixinv] = B
> > This also means that you can often do without the actual inverse by
> > moving the indexing operation to the other side of the equal sign.
> > (Not in the OP's case, though.)
>
> That is very nice. And very fast for large arrays:
>
> >> x = np.random.rand(4)
> >> timeit idx = x.argsort().argsort()
> 1000000 loops, best of 3: 1.45 us per loop
> >> timeit idx = x.argsort(); idxinv = np.zeros_like(idx); idxinv[idx] =
> np.arange(len(idx))
> 100000 loops, best of 3: 9.52 us per loop
>
> >> x = np.random.rand(1000)
> >> timeit idx = x.argsort().argsort()
> 10000 loops, best of 3: 112 us per loop
> >> timeit idx = x.argsort(); idxinv = np.zeros_like(idx); idxinv[idx] =
> np.arange(len(idx))
> 10000 loops, best of 3: 82.9 us per loop
>
> >> x = np.random.rand(100000)
> >> timeit idx = x.argsort().argsort()
> 10 loops, best of 3: 20.4 ms per loop
> >> timeit idx = x.argsort(); idxinv = np.zeros_like(idx); idxinv[idx] =
> np.arange(len(idx))
> 100 loops, best of 3: 13.2 ms per loop
>
> > I'll also add that if the OP needs the top m for 1<m<<n, sorting the
> > whole input array is not the most efficient algorithm; there are
> > priority-queue-based schemes that are asymptotically more efficient,
> > but none exists in numpy. Since numpy's sorting is quite fast, I
> > personally would just use the sorting.
>
> Partial sorting would find a lot of uses in the numpy community (like
> in median).
>
Thinking about it... along with a lot of other things.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100414/2bed7129/attachment.html
More information about the NumPy-Discussion
mailing list