[Numpy-discussion] different percentile implementations ?
Wed Mar 28 06:37:05 CDT 2012
On Wed, Mar 28, 2012 at 5:44 AM, Pierre Haessig
> Le 27/03/2012 18:56, email@example.com a écrit :
>> similar to std, var, histogram, ... some functions from scipy.stats
>> are now in numpy.
> Ok, historical reasons then. Fair enough.
> Would a "See also: numpy.percentile" make sense in stats.scoreatpercentile ?
of course, there are still many opportunities left to improve the
>> However, in contrast to std, var, I think scoreatpercentile should be
>> enhanced and not removed (similar to histogram), for example my
> I'm not sure I completely understood what was involved in your ticket.
The main point was that scoreatpercentile/quantile in mstats or in
climpy by Pierre GM has a lot more features that should be in a stats
> The overall impression I felt is :
> * for a lot of statistical computations, it is not possible and/or
> desirable to have the same code for "regular array" and for
> "masked/nans/... arrays".
I think in most cases a pure ndarray implementation without NaNs or
masks will be much faster, so I wouldn't just want to replace
stats.stats by stats.mstats and keep fast paths.
> * However, it would be possible to have the same api, that is : put
> all the entry points in scipy.stats instead of having scipy.stats.mstats
> as a separate api. Did I understand you correctly ?
What we should have, but is currently not the case, is that functions
in stats.stats and stats.mstats have the same signature/API.
Whether we can or should merge functions is still a bit open. In the
scoreatpercentile case implementing the limit keyword (which is
currently broken for 2d arrays) requires masking or something
equivalent, so the easiest is to just use the mstats implementation.
Similarly, the truncated statistic like tmean use masked arrays.
> NumPy-Discussion mailing list
More information about the NumPy-Discussion