[SciPy-dev] percentileofscore in svn

josef.pktd@gmai... josef.pktd@gmai...
Sun Nov 23 20:59:39 CST 2008


>
> Hi Josef,
>
> Is there a reason why you couldn't implement percentileofscore() with
> numpy's searchsorted()?  That would give you vectorization and more
> efficiently handle large #s of bins.
>
> Nathan Bell wnbell@gmail.com


The reason is that I never used searchsorted, and I still don't have
an overview which functions are available in numpy/scipy.

But, thank you for the hint, after I found the left and right option,
searchsorted works perfectly. It is also easy to get empirical
cumulative frequency this way, and also directly the frequency count.
It requires a sort, which would be a waste if I just need the cdf for
a single value, but then I wouldn't need a function.

The same options that I added to percentileofscore, can be easily calculated:

>>> hi = np.searchsorted([1,2,3,3,4,5,6,7,8,9], [1,2,3,4,5,6,7,8,9], side='right')
>>> lo = np.searchsorted([1,2,3,3,4,5,6,7,8,9], [1,2,3,4,5,6,7,8,9], side='left')
# rank ordering
>>> hi
array([ 1,  2,  4,  5,  6,  7,  8,  9, 10])
>>> lo
array([0, 1, 2, 4, 5, 6, 7, 8, 9])
>>> hi-lo
array([1, 1, 2, 1, 1, 1, 1, 1, 1])

percentiles of scores
>>> n=10
>>> (lo+0.5*(hi-lo))/float(n)*100     # mean wikipedia
array([  5.,  15.,  30.,  45.,  55.,  65.,  75.,  85.,  95.])
>>> (0.5*(hi+1+lo))/float(n)*100    # rank (mean rank)
array([  10.,   20.,   35.,   50.,   60.,   70.,   80.,   90.,  100.])
>>> hi/float(n)*100       # weak inequality (cdf)
array([  10.,   20.,   40.,   50.,   60.,   70.,   80.,   90.,  100.])
>>> lo/float(n)*100      # strict inequality
array([  0.,  10.,  20.,  40.,  50.,  60.,  70.,  80.,  90.])
>>> hi/float(n)*100-lo/float(n)*100  # frequencies in percent
array([ 10.,  10.,  20.,  10.,  10.,  10.,  10.,  10.,  10.])
>>>

Not, properly tested yet but looks good.

Josef


More information about the Scipy-dev mailing list