[Numpy-discussion] algorithm for faster median calculation ?
Tue Jan 15 13:50:21 CST 2013
You might want to look at this first:
Yes it is possible to compute the median faster by doing quickselect
instead of quicksort. Best case O(n) for quickselect, O(n log n) for
quicksort. But adding selection and partial sorting to NumPy is a bigger
issue than just computing medians and percentiles faster.
If we are to do this I think we should add partial sorting and selection
to npysort, not patch in some C or Cython quickselect just for the
median. When npysort has quickselect, changing the Python code to use it
for medians and percentiles is a nobrainer.
On 15.01.2013 20:31, Jerome Caron wrote:
> Dear all,
> I am new to the Numpy-discussion list.
> I would like to follow up some possibly useful information about
> calculating median.
> The message below was posted today on the AstroPy mailing list.
> Kind regards
> Jerome Caron
> I think the calculation of median values in Numpy is not optimal. I
> don't know if there are other libraries that do better?
> On my machine I get these results:
> >>> data = numpy.random.rand(5000,5000)
> >>> t0=time.time();print numpy.ma.median(data);print time.time()-t0
> >>> t0=time.time();print numpy.median(data);print time.time()-t0
> >>> t0=time.time();print aspylib.astro.get_median(data);print
> [ 0.49984574]
> The median calculation in Aspylib is using C code from Nicolas Devillard
> (can be found here: http://ndevilla.free.fr/median/index.html
> <http://ndevilla.free.fr/median/index.html>) interfaced with ctypes.
> It could be easily re-used for other, more official packages. I think
> the code also finds quantiles efficiently.
> See: http://www.aspylib.com/
> NumPy-Discussion mailing list
More information about the NumPy-Discussion