[Numpy-discussion] A faster median (Wirth's method)

Sturla Molden sturla@molden...
Wed Sep 2 09:54:34 CDT 2009


Dag Sverre Seljebotn skrev:

 > a) Is the cast to numpy.npy_intp really needed? I'm pretty sure shape is
 >
 > defined as numpy.npy_intp*.

I don't know Cython internals in detail but you do, I so take your word 
for it. I thought shape was a tuple of Python ints.


 > b) If you want higher performance with contiguous arrays (which occur a
 > lot as inplace=False is default I guess) you can do
 >
 > np.ndarray[T, ndim=1, mode="c"]
 >
 > to tell the compiler the array is contiguous. That doubles the number of
 > function instances though...

Thanks. I could either double the number of specialized select 
functions, or I could make a local copy using numpy.ascontiguousarray in 
the select function.

Quickselect touch the discontiguous array on average 2*n times, whereas 
numpy.ascontiguousarray touch the discontiguous array n times (but in 
orderly). Then there is the question of cache use: Contiguous arrays are 
the more friendly case, and numpy.ascontiguousarray is more friendly 
than quickselect. Also if quickselect is not done inplace (the common 
case for medians), we always have contigous arrays, so mode="c" is 
almost always wanted. And when quickselect is done inplace, we usually 
have a contiguous input. This is also why I used a C pointer instead of 
your buffer syntax in the first version. Then I changed my mind, not 
sure why. So I'll try with a local copy first then. I don't think we 
want close to a megabyte of Cython generated gibberish C just for the 
median.

Sturla Molden


More information about the NumPy-Discussion mailing list