[Numpy-discussion] Fast threading solution thoughts

Sturla Molden sturla@molden...
Thu Feb 12 08:27:51 CST 2009

On 2/12/2009 12:34 PM, Dag Sverre Seljebotn wrote:

> FYI, I am one of the core Cython developers and can make such 
> modifications in Cython itself as long as there's consensus on how it 
> should look on the Cython mailing list.  My problem is that I don't 
> really know OpenMP and have little experience with it, so I'm not the 
> best person for creating a draft for how such high-level OpenMP 
> constructs should look like in Cython.

I don't know the Cython internals, but I do know OpenMP. I mostly use it 
with Fortran.

The question is: Should OpenMP be comments in the Cython code (as they 
are in C and Fortran), or should OpenMP be special objects?

As for the GIL: No I don't think nogil should be implied. But Python 
objects should only be allowed as shared variables. Synchronization will 
then be as usual for shared variables in OpenMP (#pragma omp critical).

Here is my suggestion for syntax. If you just follow a consistent 
translation scheme, you don't need to know OpenMP in details. Here is a 

with openmp('parallel for', argument=iterable, ...):
    --> insert pragma directly above for

with openmp(directive, argument=iterable, ...):
    --> insert pragma and brackets

with openmp('atomic'): --> insert pragma directly

openmp('barrier') --> insert pragma directly

This by the way covers all of OpenMP. This is how it should translate:

with openmp('parallel for', private=(i,), shared=(n,),

     for i in range(n):

Compiles to:

#pragma omp parallel for \
private(i) \
shared(n) \
for(i=0; i<n; i++) {
   /* whatever */

with openmp('parallel sections',
      reduction=('+',k), private=(i,j)):

     with openmp('section'):
         i = foobar()

     with openmp('section'):
         j = foobar()

     k = i + j

Compiles to:

#pragma omp parallel sections\
     #pragma omp section
        i = foobar();

     #pragma omp section
        j = foobar();

     k = i+j;

With Python objects, the programmer must synchronize access:

with openmp('parallel for', shared=(pyobj,n), private=(i,)):
     for i in range(n):
         with openmp('critical'):
             pyobj += i

#pragma omp parallel for \
shared(pyobj,n) \
for (i=0; i<n; i++) {
    #pragma omp critical
       pyobj += i;

Atomic and barriers:

with openmp('atomic'): i += j

#pragma omp atomic
i += j;

with openmp('parallel for', default='private', shared(n,)):
    for i in range(n):

#pragma omp parallel for \
for (i=0; i<n; i++)
    #pragma omp barrier

That is my suggestion. Easy to implement as you don't need to learn 
OpenMP first (not that it is difficult).

Sturla Molden

More information about the Numpy-discussion mailing list