[Numpy-discussion] A better median function?

Chad Netzer chad.netzer@gmail....
Fri Aug 21 13:10:51 CDT 2009


On Fri, Aug 21, 2009 at 8:47 AM, Mike Ressler<mike.ressler@alum.mit.edu> wrote:
> I presented this during a lightning talk at the scipy conference
> yesterday, so again, at the risk of painting myself as a flaming
> idiot:
>
> ---------------------
> Wanted: A Better/Faster median() Function
>
> numpy implementation uses simple sorting algorithm:
> Sort all the data using the .sort() method
> Return middle value (or mean of two middle values)

Michael and I also discussed this briefly this morning at the SciPy
conference.  I'll summarize a bit:

scipy.signals has a medianfilter() implemented in C which uses the
Hoare selection algorithm:

http://en.wikipedia.org/wiki/Selection_algorithm#Partition-based_general_selection_algorithm

This *may* suit Michael's needs.  More generally, C++ std lib has both
partial_sort() and nth_element(), both of which would be "nice"
functionality to have natively (ie. C speeds) in numpy, imo:

http://www.cplusplus.com/reference/algorithm/partial_sort/
http://www.cplusplus.com/reference/algorithm/nth_element/

Since both can be made from a modified subset of quicksort, it should
be straightforward to craft these methods from the existing numpy
quicksort code.  With these primitives, it is trivial to improve the
existing numpy median() implementations.   I'll probably attempt to
tackle it myself, unless anyone else has done it (or has a better
idea).

Certainly, the ability to quickly find the minimal or maximal n
elements of a sequence, without having to perform a full sort, would
be of use to many numpy users.  Has this problem already been solved
in numpy?

-Chad


More information about the NumPy-Discussion mailing list