[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

Travis E. Oliphant oliphant@enthought....
Fri Nov 9 10:12:06 CST 2007

Hans Meine wrote:
>> Fortran order arrays can be preserved but it takes a little extra work
>> because backward compatible expectations had to be met.  See for example
>> the order argument to the copy method of arrays.
> What do you mean exactly (if you have something specific in mind at all)? 

I meant  a.copy('A') will preserver the order of 'a' on copy.
> Should the order be configurable for all operations, and default to "C" for 
> backwards compatibility?
Yes, basically that should be the case when it matters.  

> (BTW: copy() has no arguments yet in my numpybook, page 57.)
> At last, I had a look at the relevant code for adding arrays; this is what I 
> found: In core/src/umathmodule.c.src, the code templates for the relevant 
> ufuncs (e.g. add) are defined, which will get expanded to e.g. DOUBLE_add.
> This will get wrapped into an ufunc in 'InitOperators' using
>   f = PyUFunc_FromFuncAndData(add_functions, add_data, add_signatures, 18,
>                               2, 1, PyUFunc_Zero, "add",
>                               "adds the arguments elementwise.", 0);
> and inserted as "add" into a dictionary which is later passed to
> PyArray_SetNumericOps.  This will install the ufunc in the static
> NumericOps struct 'n_ops', which is used in the 'array_add' function
> installed in the PyTypeObject.

Yep,  you've got the code path correct.
> Thus, it is the ufunc code (not suprisingly), which seems to call
> DOUBLE_add for the inner loop:
Indeed.  The ufuncs store inner loops for all the data-types that are 
supported by the ufunc.  These inner loops allow for "strided" memory 
access and are called by the general ufunc code.  The loop structure 
holds the information needed to loop over the ufuncs.
> | static void
> | DOUBLE_add(char **args, intp *dimensions, intp *steps, void *func)
> | {
> |     register intp i;
> |     intp is1=steps[0],is2=steps[1],os=steps[2], n=dimensions[0];
> |     char *i1=args[0], *i2=args[1], *op=args[2];
> |     for(i=0; i<n; i++, i1+=is1, i2+=is2, op+=os) {
> |         *((double *)op)=*((double *)i1) + *((double *)i2);
> |     }
> | }
> If I understood David correctly, he suggested an unstrided variant of
> this inner loop, which simply uses pointer increments. 
I'm not sure if he was saying that or not.  But, more generally, I'm all 
for optimizations at this level.  But, they will have to be done 
differently for different cases with this loop as the fall-back.
>  While this is
> a good idea (also probably quite some work), the real thing bugging me is
> that the above DOUBLE_add could (and should!) be called by the ufunc
> framework in such a way that it is equally efficient for C and Fortran
> arrays.
Yes, that's what I was talking about.  There is actually a path through 
the ufunc code where this loop is called only once.  The requirement 
right now is that all the arrays are C-contiguous, but this should be 
changed to all arrays have the same contiguousness (and the output-array 
creation code changed to create Fortran-order arrays when the inputs are 
all Fortran-order).

> BTW: What are the "void *func" in core/src/umathmodule.c.src arguments
> for?  Shouldn't that be "void * /*func*/", since the parameters are
> never used?

Sure, it could be void * /* func*/ 

You'll have to forgive my ignorance, but does that matter?   Are we 
using up a register or something by naming an unused argument?

-Travis O.

More information about the Numpy-discussion mailing list