[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays
Thu Nov 8 10:55:13 CST 2007
Am Donnerstag, 08. November 2007 17:31:40 schrieb David Cournapeau:
> This is because the current implementation for at least some of the
> operations you are talking about are using PyArray_GenericReduce and
> other similar functions, which are really high level (they use python
> callable, etc..). This is easier, because you don't have to care about
> anything (type, etc...), but this means that python runtime is
> handling all the thing.
I suspected that after your last post, but that's really bad for pointwise
operations on a contiguous, aligned array. A simple transpose() should
really not make any difference here.
> Instead, you should use a pure C
> implementation (by pure C, I mean a C function totally independant of
> python, only dealing with standard C types). This would already lead a
> significant performance leap.
AFAICS, it would be much more elegant and easier to implement this using C++
templates. We have a lot of experience with such a design from our VIGRA
library ( http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/ ), which
is an imaging library based on the STL concepts (and some necessary and
convenient extensions for higher-dimensional arrays and a more flexible API).
I am not very keen on writing hundreds of lines of C code for things that can
easily be handled with C++ functors. But I don't think that I am the first
to propose this, and I know that C has some advantages (faster compilation;
are there more? ;-) ) - what is the opinion on this in the SciPy community?
> If you have segmented addresses, I don't think the ordering matters
> much anymore, for memory access, no ?
Yes, I think it does. It probably depends on the sizes of the segments
though. If you have a multi-segment box-sub-range of a large dataset (3D
volume or even only 2D), processing each contiguous "row" (column/...) at
once within the inner loop definitely makes a difference. I.e. as long as
one dimension is not strided (and the data's extent in this dimension is not
too small), it should be handled in the inner loop. The other loops
probably don't make a big difference.
Ciao, / /
/ / ANS
More information about the Numpy-discussion