[Numpy-discussion] Find the N maximum values and corresponding indexes in an array

Anne Archibald peridot.faceted@gmail....
Wed Dec 2 19:52:02 CST 2009

2009/12/2 Keith Goodman <kwgoodman@gmail.com>:
> On Wed, Dec 2, 2009 at 5:27 PM, Anne Archibald
> <peridot.faceted@gmail.com> wrote:
>> 2009/12/2 Keith Goodman <kwgoodman@gmail.com>:
>>> On Wed, Dec 2, 2009 at 5:09 PM, Neal Becker <ndbecker2@gmail.com> wrote:
>>>> David Warde-Farley wrote:
>>>>> On 2-Dec-09, at 6:55 PM, Howard Chong wrote:
>>>>>> def myFindMaxA(myList):
>>>>>>    """implement finding maximum value with for loop iteration"""
>>>>>>    maxIndex=0
>>>>>>    maxVal=myList[0]
>>>>>>    for index, item in enumerate(myList):
>>>>>>        if item[0]>maxVal:
>>>>>>            maxVal=item[0]
>>>>>>            maxIndex=index
>>>>>>    return [maxIndex, maxVal]
>>>>>> My question is: how can I make the latter version run faster? I
>>>>>> think the
>>>>>> answer is that I have to do the iteration in C.
>>>>> def find_biggest_n(myarray, n):
>>>>> ind = np.argsort(myarray)
>>>>> return ind[-n:], myarray[ind[-n:]]
>>>>> David
>>>> Not bad, although I wonder whether a partial sort could be faster.
>>> I'm doing a lot of sorting right now. I only need to sort the lowest
>>> 30% of values in a 1d array (about 250k elements), the rest I don't
>>> need to sort. How do I do a partial sort?
>> Algorithmically, if you're doing a quicksort, you just don't sort one
>> side of a partition when it's outside the range you want sorted (which
>> could even be data-dependent, I suppose, as well as
>> number-of-items-dependent). This is useful for sorting out only the
>> extreme elements, as well as finding quantiles (and those elements
>> above/below them). Unfortunately I'm not aware of any implementation,
>> useful though it might be.
> Oh, I thought he meant there was a numpy function for partial sorting.
> What kind of speed up for my problem (sort lower 30% of a 1d array
> with 250k elements) could I expect if I paid someone to write it in
> cython? Twice as fast? I'm actually doing x.argsort(), if that
> matters.

Hard to say exactly, but I'd say at best a factor of two. You can get
a rough best-case value by doing an argmin followed by an argsort of
the first 30%.

In reality things will be a bit worse because you won't immediately be
able to discard the whole rest of the array, and because your argsort
will be accessing data spread over a larger swath of memory (so worse
cache performance). But if this doesn't provide a useful speedup, it's
very unlikely partial sorting will be of any use to you.

Medians and other quantiles, or partitioning tasks, are where it really shines.


More information about the NumPy-Discussion mailing list