[Numpy-discussion] Vectorized percentile function in Numpy (PR #2970)
Tue Apr 23 22:33:29 CDT 2013
On Tue, Apr 23, 2013 at 6:16 PM, Sebastian Berg
> On Tue, 2013-04-23 at 12:13 -0500, Jonathan Helmus wrote:
>> Back in December it was pointed out on the scipy-user list that
>> numpy has a percentile function which has similar functionality to
>> scipy's stats.scoreatpercentile. I've been trying to harmonize these
>> two functions into a single version which has the features of both.
>> Scipy PR 374 introduced a version which look the parameters from
>> both the scipy and numpy percentile function and was accepted into Scipy
>> with the plan that it would be depreciated when a similar function was
>> introduced into Numpy. Then I moved to enhancing the Numpy version with
>> Pull Request 2970 . With some input from Sebastian Berg the
>> percentile function was rewritten with further vectorization, but
>> neither of us felt fully comfortable with the final product. Can
>> someone look at implementation in the PR and suggest what should be done
>> from here?
> Thanks! For me the main question is the vectorized usage when both
> haystack (`a`) and needle (`q`) are vectorized. What I mean is for:
> np.percentile(np.random.randn(n1, n2, N), [25., 50., 75.], axis=-1)
> I would probably expect an output shape of (n1, n2, 3), but currently
> you will get the needle dimensions first, because it is roughly the same
> [np.percentile(np.random.randn(n1, n2, N), q, axis=-1) for q in [25., 50., 75.]]
> so for the (probably rare) vectorization of both `a` and `q`, would it
> be preferable to do some kind of long term behaviour change, or just put
> the dimensions in `q` first, which should be compatible to the current
I don't have much of a preference either way, but I'm glad this is
going into numpy.
We can work with it either way.
In stats, the most common case will be axis=0, and then the two are
the same, aren't they?
What I like about the second version is unrolling (with 2 or 3
quantiles), which I think will work
u, l = np.random.randn(2,5)
res = np.percentile(...)
The first case will be nicer when there are lots of percentiles, but I
guess I won't need it much except for axis=0.
Actually, I would prefer the second version, because it might be a bit
more cumbersome to get the individual percentiles out if the axis is
somewhere in the middle, however I don't think I have a case like
The first version would be consistent with reduceat, and that would be
more numpythonic. I would go for that in numpy.
>> - Jonathan Helmus
>>  http://thread.gmane.org/gmane.comp.python.scientific.user/33331
>>  https://github.com/scipy/scipy/pull/374
>>  https://github.com/numpy/numpy/pull/2970
>> NumPy-Discussion mailing list
> NumPy-Discussion mailing list
More information about the NumPy-Discussion