[Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)
david at ar.media.kyoto-u.ac.jp
Thu Jan 11 23:08:29 CST 2007
Christopher Barker wrote:
>> autogen works well enough for me;
> I didn't know about autogen -- that may be all we need.
numpy has code which already does something similar to autogen: you
declare a function, and some template with a generic name, and the code
generator replaces the generic name and type with some values. All the
.src files in numpy/core follow this pattern.
>> Now, I didn't know that clip was supposed to handle arrays as min/max
> one more nifty feature...And if you want to support broadcasting, even
> more so!
>> At first, I didn't understand the need to care about
>> contiguous/non contiguous; having non scalar for min/max makes it
>> necessary to have special case for non contiguous.
> I'm confused. This issue is that you can't just increment the pointer to
> get the next element if the array is non-contiguous.. you need to do all
> the strides, etc, math.
Ok, so we don't mean the same thing by contiguous, and I should check
that my definition is the actual one... For me, contiguous means that
the array has C order, and a non contiguous array has a 'random' order,
but still can go to the next element in the buffer by using standard C
array addressing. In my mind, contiguous is about the relationship
between the indexing of the array in C and the math indexing.
According to the numpy ebook, the data buffer may:
- not be aligned on word boundaries -> NPY_ALIGNED
- not be native endianess -> NPY_ISNOTSWAPPED
- not C contiguous (last index does not move first) -> NPY_CONTIGUOUS.
I thought that as long as NPY_ALIGNED is true, you are sure that
array->data[i] is the ith element of the buffer with the datatype of the
If the data are not aligned or not native endian, I just use the
existing implementation; if you are not using the CPU endianness or
alignment, you cannot expect to do things at a decent speed anyway.
In my code, I differentiate alignment, endianness and scalar case. If
any of this condition is not true, I just rely on the old implementation
for now, which should make it easy to extend if necessary.
More information about the Numpy-discussion