[Numpy-discussion] Find the N maximum values and corresponding indexes in an array

Keith Goodman kwgoodman@gmail....
Wed Dec 2 19:39:44 CST 2009


On Wed, Dec 2, 2009 at 5:27 PM, Anne Archibald
<peridot.faceted@gmail.com> wrote:
> 2009/12/2 Keith Goodman <kwgoodman@gmail.com>:
>> On Wed, Dec 2, 2009 at 5:09 PM, Neal Becker <ndbecker2@gmail.com> wrote:
>>> David Warde-Farley wrote:
>>>
>>>> On 2-Dec-09, at 6:55 PM, Howard Chong wrote:
>>>>
>>>>> def myFindMaxA(myList):
>>>>>    """implement finding maximum value with for loop iteration"""
>>>>>    maxIndex=0
>>>>>    maxVal=myList[0]
>>>>>    for index, item in enumerate(myList):
>>>>>        if item[0]>maxVal:
>>>>>            maxVal=item[0]
>>>>>            maxIndex=index
>>>>>    return [maxIndex, maxVal]
>>>>>
>>>>>
>>>>>
>>>>> My question is: how can I make the latter version run faster? I
>>>>> think the
>>>>> answer is that I have to do the iteration in C.
>>>>
>>>>
>>>> def find_biggest_n(myarray, n):
>>>> ind = np.argsort(myarray)
>>>> return ind[-n:], myarray[ind[-n:]]
>>>>
>>>> David
>>> Not bad, although I wonder whether a partial sort could be faster.
>>
>> I'm doing a lot of sorting right now. I only need to sort the lowest
>> 30% of values in a 1d array (about 250k elements), the rest I don't
>> need to sort. How do I do a partial sort?
>
> Algorithmically, if you're doing a quicksort, you just don't sort one
> side of a partition when it's outside the range you want sorted (which
> could even be data-dependent, I suppose, as well as
> number-of-items-dependent). This is useful for sorting out only the
> extreme elements, as well as finding quantiles (and those elements
> above/below them). Unfortunately I'm not aware of any implementation,
> useful though it might be.

Oh, I thought he meant there was a numpy function for partial sorting.

What kind of speed up for my problem (sort lower 30% of a 1d array
with 250k elements) could I expect if I paid someone to write it in
cython? Twice as fast? I'm actually doing x.argsort(), if that
matters.


More information about the NumPy-Discussion mailing list