[SciPy-dev] percentileofscore

Robert Kern robert.kern@gmail....
Sun Nov 16 21:56:02 CST 2008


On Sun, Nov 16, 2008 at 21:44,  <josef.pktd@gmail.com> wrote:
> What is percentileofscore supposed to do?
> I did not find any good interpretation what the numbers
> are supposed to mean.

It's a poor implementation (IMO; I wrote that comment).

> >From statistics, I am used to a definition according to the
> cdf, i.e. fraction of elements weakly smaller than the "score".

Yup.

> Instead a strictly smaller definition could be useful, as
> used eg. in ranking of schools.
> The current implementation with histogram, does not give
> results that I can easily interpret.
> The proposed implementation, has still one error as mentioned
> by Stefan. It uses the mean when there are multiple elements presents.
>
> I looked at 3 cases:
> * the score element is uniquely present in array
> * multiple elements in the array are equal to the score
> * no element in the array is equal to the score
>
> I tried out 5 different definitions
> percentileofscore_proposed: taken from google review with correction
> percentileofscore_mean: similar to proposed, give mean rank if multiple present
>     This just adds another correction to the proposed version (start
> index at one instead of zero)
> percentileofscore_meaninterp: similar to proposed, interpolate if missing
> percentileofscore_strict: one liner, Fraction(x<score)
> percentileofscore_weak one liner, Fraction(x<=score)

Wikipedia says to use half of the frequency of the ties (x==score) in
addition to the cumulative frequency of strict x<score.

  http://en.wikipedia.org/wiki/Percentile_rank

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the Scipy-dev mailing list