[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
James Philbin
philbinj@gmail....
Sun Mar 23 13:22:48 CDT 2008
OK, i'm really impressed with the improvements in vectorization for
gcc 4.3. It really seems like it's able to work with real loops which
wasn't the case with 4.1. I think Chuck's right that we should simply
special case contiguous data and allow the auto-vectorizer to do the
rest. Something like this for the ufuncs:
/**begin repeat
#TYPE=(BOOL,
BYTE,UBYTE,SHORT,USHORT,INT,UINT,LONG,ULONG,LONGLONG,ULONGLONG,FLOAT,DOUBLE,LONGDOUBLE)*2#
#OP=||, +*13, ^, -*13#
#kind=add*14, subtract*14#
#typ=(Bool, byte, ubyte, short, ushort, int, uint, long, ulong,
longlong, ulonglong, float, double, longdouble)*2#
*/
static void
@TYPE@_@kind@_contig(@typ@ *i1, @typ@ *i2, @type@ *op, int n)
{
int i;
for (i=0; i<n; i++) {
op[i] = i1[i] @OP@ i2[i];
}
}
static void
@TYPE@_@kind@(char **args, intp *dimensions, intp *steps, void *func)
{
register intp i;
intp is1=steps[0],is2=steps[1],os=steps[2], n=dimensions[0];
char *i1=args[0], *i2=args[1], *op=args[2];
if (is1==1 && is2==1 && os==1)
return @TYPE@_@kind@_contig((@typ@ *)i1, (@typ@ *)i2, (@typ@ *)os, n);
for(i=0; i<n; i++, i1+=is1, i2+=is2, op+=os) {
*((@typ@ *)op)=*((@typ@ *)i1) @OP@ *((@typ@ *)i2);
}
}
/**end repeat**/
We also need to add -ftree-vectorize to the standard compile flags at
least for the ufuncs.
James
More information about the Numpy-discussion
mailing list