[SciPy-User] scoreatpercentile behaviour

Andreas Hilboll lists@hilboll...
Thu Jan 24 11:46:04 CST 2013


Am 24.01.2013 18:41, schrieb josef.pktd@gmail.com:
> On Thu, Jan 24, 2013 at 11:59 AM, Andreas Hilboll <lists@hilboll.de> wrote:
>> I just had a quick look into scipy.stats.scoreatpercentile, and was
>> disappointed to see that it's currently not possible to do the
>> calculation for more than one percentile at a time (``per`` is scalar).
>> So I had a quick look into the sources, and was surprised to see that
>> apprently, the function expects ordered input ``a``, which is not noted
>> in the docstring. (Or maybe it's just my misunderstanding of the word
>> 'percentile'. I had expected the function to work on the input's
>> **values**, not on the indices.
> 
> I'm not sure what you mean here:
> ``a`` is sorted by the function, and then we take the n*per smallest
> value (roughly, interpolates).
> That gives you the quantile value of the input array.
> 
>>
>> Is this a bug or a feature? If it's a feature, this should be very
>> explicitly noted in the docstring, I think. I'm willing to do so if you
>> can confirm that the current behaviour is actually wanted.
>>
>> In the sources' TODO, it's stated that a more general percentile
>> implementation would be welcome. I might be able to contribute something
>> here; any hints on where to start?
> 
> there is a pull request that follows the numpy implementation
> https://github.com/scipy/scipy/pull/374
> 
> stats.mstats has different options
> stats.mstats.scoreatpercentile and stats.mstats.mquantiles
> 
> (I also wrote a draft for a fully vectorized version of it.)
> 
> It's one of those function where I don't like the current
> implementation much, but don't know what the alternative should be.
> For example in statsmodels we also use stats.mstats.mquantiles because
> it has interpolation and an axis option.
> 
> (So, I'm staying partially on the sidelines on this.)
> 
> Josef

Sorry for the noise, it turns out I was just too blind / tired /
whatever to notice the very first line of the function

   values = np.sort(a, axis=0)

My dumb fault. Thanks for making me realize.

Andreas.



More information about the SciPy-User mailing list