[Numpy-discussion] A faster median (Wirth's method)
Dag Sverre Seljebotn
dagss@student.matnat.uio...
Tue Sep 1 14:51:58 CDT 2009
Sturla Molden wrote:
> We recently has a discussion regarding an optimization of NumPy's median
> to average O(n) complexity. After some searching, I found out there is a
> selection algorithm competitive in speed with Hoare's quick select. It
> has the advantage of being a lot simpler to implement. In plain Python:
>
> import numpy as np
>
> def wirthselect(array, k):
>
> """ Niklaus Wirth's selection algortithm """
>
> a = np.ascontiguousarray(array)
> if (a is array): a = a.copy()
>
> l = 0
> m = a.shape[0] - 1
> while l < m:
> x = a[k]
> i = l
> j = m
> while 1:
> while a[i] < x: i += 1
> while x < a[j]: j -= 1
> if i <= j:
> tmp = a[i]
> a[i] = a[j]
> a[j] = tmp
> i += 1
> j -= 1
> if i > j: break
> if j < k: l = i
> if k < i: m = j
>
> return a
>
>
> Now, the median can be obtained in average O(n) time as:
>
>
> def median(x):
>
> """ median in average O(n) time """
>
> n = x.shape[0]
> k = n >> 1
> s = wirthselect(x, k)
> if n & 1:
> return s[k]
> else:
> return 0.5*(s[k]+s[:k].max())
>
>
> The beauty of this is that Wirth select is extremely easy to migrate to
> Cython:
>
>
> import numpy
> ctypedef numpy.double_t T # or whatever
>
> def wirthselect(numpy.ndarray[T, ndim=1] array, int k):
>
> cdef int i, j, l, m
> cdef T x, tmp
> cdef T *a
>
> _array = np.ascontiguousarray(array)
> if (_array is array): _array = _array.copy()
> a = <T *> _array.data
>
> l = 0
> m = <int> a.shape[0] - 1
Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the
right type to use in this case?
--
Dag Sverre
More information about the NumPy-Discussion
mailing list