Sun Nov 16 21:56:02 CST 2008
On Sun, Nov 16, 2008 at 21:44, <email@example.com> wrote:
> What is percentileofscore supposed to do?
> I did not find any good interpretation what the numbers
> are supposed to mean.
It's a poor implementation (IMO; I wrote that comment).
> >From statistics, I am used to a definition according to the
> cdf, i.e. fraction of elements weakly smaller than the "score".
> Instead a strictly smaller definition could be useful, as
> used eg. in ranking of schools.
> The current implementation with histogram, does not give
> results that I can easily interpret.
> The proposed implementation, has still one error as mentioned
> by Stefan. It uses the mean when there are multiple elements presents.
> I looked at 3 cases:
> * the score element is uniquely present in array
> * multiple elements in the array are equal to the score
> * no element in the array is equal to the score
> I tried out 5 different definitions
> percentileofscore_proposed: taken from google review with correction
> percentileofscore_mean: similar to proposed, give mean rank if multiple present
> This just adds another correction to the proposed version (start
> index at one instead of zero)
> percentileofscore_meaninterp: similar to proposed, interpolate if missing
> percentileofscore_strict: one liner, Fraction(x<score)
> percentileofscore_weak one liner, Fraction(x<=score)
Wikipedia says to use half of the frequency of the ties (x==score) in
addition to the cumulative frequency of strict x<score.
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the Scipy-dev