[Numpy-discussion] algorithm for faster median calculation ?

Sturla Molden sturla@molden...
Tue Jan 15 13:50:21 CST 2013


You might want to look at this first:

https://github.com/numpy/numpy/issues/1811

Yes it is possible to compute the median faster by doing quickselect 
instead of quicksort. Best case O(n) for quickselect, O(n log n) for 
quicksort. But adding selection and partial sorting to NumPy is a bigger 
issue than just computing medians and percentiles faster.

If we are to do this I think we should add partial sorting and selection 
to npysort, not patch in some C or Cython quickselect just for the 
median. When npysort has quickselect, changing the Python code to use it 
for medians and percentiles is a nobrainer.

https://github.com/numpy/numpy/tree/master/numpy/core/src/npysort



Sturla



On 15.01.2013 20:31, Jerome Caron wrote:
> Dear all,
> I am new to the Numpy-discussion list.
> I would like to follow up some possibly useful information about
> calculating median.
> The message below was posted today on the AstroPy mailing list.
> Kind regards
> Jerome Caron
> #----------------------------------------
> I think the calculation of median values in Numpy is not optimal. I
> don't know if there are other libraries that do better?
> On my machine I get these results:
>  >>> data = numpy.random.rand(5000,5000)
>  >>> t0=time.time();print numpy.ma.median(data);print time.time()-t0
> 0.499845739822
> 15.1949999332
>  >>> t0=time.time();print numpy.median(data);print time.time()-t0
> 0.499845739822
> 4.32100009918
>  >>> t0=time.time();print aspylib.astro.get_median(data);print
> time.time()-t0
> [ 0.49984574]
> 0.90499997139
>  >>>
> The median calculation in Aspylib is using C code from Nicolas Devillard
> (can be found here: http://ndevilla.free.fr/median/index.html
> <http://ndevilla.free.fr/median/index.html>) interfaced with ctypes.
> It could be easily re-used for other, more official packages. I think
> the code also finds quantiles efficiently.
> See: http://www.aspylib.com/
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list