[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

Travis E. Oliphant oliphant@enthought....
Sat Mar 22 13:20:30 CDT 2008


Charles R Harris wrote:
>
>
> On Sat, Mar 22, 2008 at 11:43 AM, Neal Becker <ndbecker2@gmail.com 
> <mailto:ndbecker2@gmail.com>> wrote:
>
>     James Philbin wrote:
>
>     > Personally, I think that the time would be better spent optimizing
>     > routines for single-threaded code and relying on BLAS and LAPACK
>     > libraries to use multiple cores for more complex calculations. In
>     > particular, doing some basic loop unrolling and SSE versions of the
>     > ufuncs would be beneficial. I have some experience writing SSE code
>     > using intrinsics and would be happy to give it a shot if people tell
>     > me what functions I should focus on.
>     >
>     > James
>
>     gcc keeps advancing autovectorization.  Is manual vectorization
>     worth the
>     trouble?
>
>
> The inner loop of a unary ufunc looks like
>
> /*UFUNC_API*/
> static void
> PyUFunc_d_d(char **args, intp *dimensions, intp *steps, void *func)
> {
>     intp i;
>     char *ip1=args[0], *op=args[1];
>     for(i=0; i<*dimensions; i++, ip1+=steps[0], op+=steps[1]) {
>         *(double *)op = ((DoubleUnaryFunc *)func)(*(double *)ip1);
>     }
> }
>
>
> While it might help the compiler to put the steps on the stack as 
> constants, it is hard to see how the compiler could vectorize the loop 
> given the information available and the fact that the input data might 
> not be aligned or contiguous. I suppose one could make a small local 
> buffer, copy the data into it, and then use sse, and that might 
> actually help for some things. But it is also likely that the function 
> itself won't deal gracefully with vectorized data.

I think the thing to do is to special-case the code so that if the 
strides work for vectorization, then a different bit of code is executed 
and this current code is used as the final special-case.

Something like this would be relatively straightforward, if a bit 
tedious, to do.

-Travis





More information about the Numpy-discussion mailing list