[Numpy-discussion] A faster median (Wirth's method)
Sturla Molden
sturla@molden...
Wed Sep 2 09:54:34 CDT 2009
Dag Sverre Seljebotn skrev:
> a) Is the cast to numpy.npy_intp really needed? I'm pretty sure shape is
>
> defined as numpy.npy_intp*.
I don't know Cython internals in detail but you do, I so take your word
for it. I thought shape was a tuple of Python ints.
> b) If you want higher performance with contiguous arrays (which occur a
> lot as inplace=False is default I guess) you can do
>
> np.ndarray[T, ndim=1, mode="c"]
>
> to tell the compiler the array is contiguous. That doubles the number of
> function instances though...
Thanks. I could either double the number of specialized select
functions, or I could make a local copy using numpy.ascontiguousarray in
the select function.
Quickselect touch the discontiguous array on average 2*n times, whereas
numpy.ascontiguousarray touch the discontiguous array n times (but in
orderly). Then there is the question of cache use: Contiguous arrays are
the more friendly case, and numpy.ascontiguousarray is more friendly
than quickselect. Also if quickselect is not done inplace (the common
case for medians), we always have contigous arrays, so mode="c" is
almost always wanted. And when quickselect is done inplace, we usually
have a contiguous input. This is also why I used a C pointer instead of
your buffer syntax in the first version. Then I changed my mind, not
sure why. So I'll try with a local copy first then. I don't think we
want close to a megabyte of Cython generated gibberish C just for the
median.
Sturla Molden
More information about the NumPy-Discussion
mailing list