[SciPy-dev] percentileofscore in svn
josef.pktd@gmai...
josef.pktd@gmai...
Sun Nov 23 20:59:39 CST 2008
>
> Hi Josef,
>
> Is there a reason why you couldn't implement percentileofscore() with
> numpy's searchsorted()? That would give you vectorization and more
> efficiently handle large #s of bins.
>
> Nathan Bell wnbell@gmail.com
The reason is that I never used searchsorted, and I still don't have
an overview which functions are available in numpy/scipy.
But, thank you for the hint, after I found the left and right option,
searchsorted works perfectly. It is also easy to get empirical
cumulative frequency this way, and also directly the frequency count.
It requires a sort, which would be a waste if I just need the cdf for
a single value, but then I wouldn't need a function.
The same options that I added to percentileofscore, can be easily calculated:
>>> hi = np.searchsorted([1,2,3,3,4,5,6,7,8,9], [1,2,3,4,5,6,7,8,9], side='right')
>>> lo = np.searchsorted([1,2,3,3,4,5,6,7,8,9], [1,2,3,4,5,6,7,8,9], side='left')
# rank ordering
>>> hi
array([ 1, 2, 4, 5, 6, 7, 8, 9, 10])
>>> lo
array([0, 1, 2, 4, 5, 6, 7, 8, 9])
>>> hi-lo
array([1, 1, 2, 1, 1, 1, 1, 1, 1])
percentiles of scores
>>> n=10
>>> (lo+0.5*(hi-lo))/float(n)*100 # mean wikipedia
array([ 5., 15., 30., 45., 55., 65., 75., 85., 95.])
>>> (0.5*(hi+1+lo))/float(n)*100 # rank (mean rank)
array([ 10., 20., 35., 50., 60., 70., 80., 90., 100.])
>>> hi/float(n)*100 # weak inequality (cdf)
array([ 10., 20., 40., 50., 60., 70., 80., 90., 100.])
>>> lo/float(n)*100 # strict inequality
array([ 0., 10., 20., 40., 50., 60., 70., 80., 90.])
>>> hi/float(n)*100-lo/float(n)*100 # frequencies in percent
array([ 10., 10., 20., 10., 10., 10., 10., 10., 10.])
>>>
Not, properly tested yet but looks good.
Josef
More information about the Scipy-dev
mailing list