[SciPy-User] scoreatpercentile behaviour

josef.pktd@gmai... josef.pktd@gmai...
Fri Jan 25 09:17:25 CST 2013


On Fri, Jan 25, 2013 at 9:07 AM, Sturla Molden <sturla@molden.no> wrote:
> On 24.01.2013 18:46, Andreas Hilboll wrote:
>
>> Sorry for the noise, it turns out I was just too blind / tired /
>> whatever to notice the very first line of the function
>>
>>     values = np.sort(a, axis=0)
>>
>> My dumb fault. Thanks for making me realize.
>
> I wonder if it's not better to use quickselect than quicksort for
> percentiles? We can quickselect to find the sample closest to the
> desired percentile. Then (since quickselect leaves the data partially
> sorted), we can find the k-nearest neighbors by linear search in each
> direction, possibly using a heap, leaving us with 2*k+1 samples with
> which to interpolate the score at the percentile. That should give us
> the percentiles in average O(n) instead of average O(n log n) time.

Would this help much if we need the interquartile range, i.e 25 an 75
percentile?
(That's the main usecase currently in statsmodels.)

quickselect (partial sorting) sounds useful if we just need a few
percentiles in the tail close to each other.

Josef

>
> Sturla
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


More information about the SciPy-User mailing list