[Numpy-discussion] Vectorized percentile function in Numpy (PR #2970)

Sebastian Berg sebastian@sipsolutions....
Wed Apr 24 12:43:40 CDT 2013


On Wed, 2013-04-24 at 12:03 -0400, josef.pktd@gmail.com wrote:
> On Wed, Apr 24, 2013 at 4:11 AM, Sebastian Berg
> <sebastian@sipsolutions.net> wrote:
> > On Tue, 2013-04-23 at 23:33 -0400, josef.pktd@gmail.com wrote:
> >> On Tue, Apr 23, 2013 at 6:16 PM, Sebastian Berg
> >> <sebastian@sipsolutions.net> wrote:
> >> > On Tue, 2013-04-23 at 12:13 -0500, Jonathan Helmus wrote:
> >> >>      Back in December it was pointed out on the scipy-user list[1] that
> >> >> numpy has a percentile function which has similar functionality to
> >> >> scipy's stats.scoreatpercentile.  I've been trying to harmonize these
> >> >> two functions into a single version which has the features of both.
> >> >>      Scipy PR 374[2] introduced a version which look the parameters from
> >> >> both the scipy and numpy percentile function and was accepted into Scipy
> >> >> with the plan that it would be depreciated when a similar function was
> >> >> introduced into Numpy.  Then I moved to enhancing the Numpy version with
> >> >> Pull Request 2970 [3].  With some input from Sebastian Berg the
> >> >> percentile function was rewritten with further vectorization, but
> >> >> neither of us felt fully comfortable with the final product.  Can
> >> >> someone look at implementation in the PR and suggest what should be done
> >> >> from here?
> >> >>
> >> >
> >> > Thanks! For me the main question is the vectorized usage when both
> >> > haystack (`a`) and needle (`q`) are vectorized. What I mean is for:
> >> >
> >> > np.percentile(np.random.randn(n1, n2, N), [25., 50., 75.], axis=-1)
> >> >
> >> > I would probably expect an output shape of (n1, n2, 3), but currently
> >> > you will get the needle dimensions first, because it is roughly the same
> >> > as
> >> >
> >> > [np.percentile(np.random.randn(n1, n2, N), q, axis=-1) for q in [25., 50., 75.]]
> >> >
> >> > so for the (probably rare) vectorization of both `a` and `q`, would it
> >> > be preferable to do some kind of long term behaviour change, or just put
> >> > the dimensions in `q` first, which should be compatible to the current
> >> > list?
> >>
> >> I don't have much of a preference either way, but I'm glad this is
> >> going into numpy.
> >> We can work with it either way.
> >>
> >> In stats, the most common case will be axis=0, and then the two are
> >> the same, aren't they?
> >>
> >> What I like about the second version is unrolling (with 2 or 3
> >> quantiles), which I think will work
> >>
> >> u, l = np.random.randn(2,5)
> >> or
> >> res = np.percentile(...)
> >> func(*res)
> >>
> >> The first case will be nicer when there are lots of percentiles, but I
> >> guess I won't need it much except for axis=0.
> >>
> >> Actually, I would prefer the second version, because it might be a bit
> >> more cumbersome to get the individual percentiles out if the axis is
> >> somewhere in the middle, however I don't think I have a case like
> >> that.
> >>
> >
> > I never thought about the axis being where to insert the dimensions of
> > the quantiles. That would be a third option. It feels simpler to me to
> > just always use the end (or the start) though.
> 
> If the choices are start or end, then I prefer start for unpacking.
> 

I missed the reduceat argument, it kind of makes sense to me (and
usually we will have either axis=0 or axis=-1 I guess). I was going to
check what searchsorted does, but it doesn't vectorize :).

Sebastian

> Josef
> 
> >
> > - Sebastian
> >
> >> The first version would be consistent with reduceat, and that would be
> >> more numpythonic. I would go for that in numpy.
> >>
> >> my 2.5c
> >>
> >> Josef
> >>
> >> >
> >> > Regards,
> >> >
> >> > Sebastian
> >> >
> >> >>   Cheers,
> >> >>
> >> >>      - Jonathan Helmus
> >> >>
> >> >>
> >> >> [1] http://thread.gmane.org/gmane.comp.python.scientific.user/33331
> >> >> [2] https://github.com/scipy/scipy/pull/374
> >> >> [3] https://github.com/numpy/numpy/pull/2970
> >> >> _______________________________________________
> >> >> NumPy-Discussion mailing list
> >> >> NumPy-Discussion@scipy.org
> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >>
> >> >
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > NumPy-Discussion@scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 




More information about the NumPy-Discussion mailing list