[Numpy-discussion] algorithm for faster median calculation ?
Tue Jan 15 13:49:47 CST 2013
On Jan 15, 2013, at 8:31 PM, Jerome Caron <email@example.com> wrote:
> Dear all,
> I am new to the Numpy-discussion list.
> I would like to follow up some possibly useful information about calculating median.
> The message below was posted today on the AstroPy mailing list.
> Kind regards
> Jerome Caron
> I think the calculation of median values in Numpy is not optimal. I don't know if there are other libraries that do better?
> On my machine I get these results:
> >>> data = numpy.random.rand(5000,5000)
> >>> t0=time.time();print numpy.ma.median(data);print time.time()-t0
> >>> t0=time.time();print numpy.median(data);print time.time()-t0
> >>> t0=time.time();print aspylib.astro.get_median(data);print time.time()-t0
> [ 0.49984574]
> The median calculation in Aspylib is using C code from Nicolas Devillard (can be found here: http://ndevilla.free.fr/median/index.html) interfaced with ctypes.
> It could be easily re-used for other, more official packages. I think the code also finds quantiles efficiently.
> See: http://www.aspylib.com/
> NumPy-Discussion mailing list
some of the numpy devs are already discussing how to best implement the fast median for numpy here:
https://github.com/numpy/numpy/issues/1811 "median in average O(n) time"
If you want to get an email when someone posts a comment on that github ticket, sign up for a free github account, then click on "watch tread" at the bottom of that issue.
Note that numpy is BSD-licensed, so they can't take GPL-licensed code.
But I think looking at the method you have in aspylib is OK, so thanks for sharing!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion