[Numpy-discussion] algorithm for faster median calculation ?

Tue Jan 15 13:49:47 CST 2013

```On Jan 15, 2013, at 8:31 PM, Jerome Caron <jerome_caron_astro@ymail.com> wrote:

> Dear all,
> I am new to the Numpy-discussion list.
> I would like to follow up some possibly useful information about calculating median.
> The message below was posted today on the AstroPy mailing list.
> Kind regards
> Jerome Caron
>
> #----------------------------------------
> I think the calculation of median values in Numpy is not optimal. I don't know if there are other libraries that do better?
> On my machine I get these results:
> >>> data = numpy.random.rand(5000,5000)
> >>> t0=time.time();print numpy.ma.median(data);print time.time()-t0
> 0.499845739822
> 15.1949999332
> >>> t0=time.time();print numpy.median(data);print time.time()-t0
> 0.499845739822
> 4.32100009918
> >>> t0=time.time();print aspylib.astro.get_median(data);print time.time()-t0
> [ 0.49984574]
> 0.90499997139
> >>>
> The median calculation in Aspylib is using C code from Nicolas Devillard (can be found here: http://ndevilla.free.fr/median/index.html) interfaced with ctypes.
> It could be easily re-used for other, more official packages. I think the code also finds quantiles efficiently.
> See: http://www.aspylib.com/
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Jerome,

some of the numpy devs are already discussing how to best implement the fast median for numpy here:
https://github.com/numpy/numpy/issues/1811 "median in average O(n) time"

If you want to get an email when someone posts a comment on that github ticket, sign up for a free github account, then click on "watch tread" at the bottom of that issue.