[Numpy-discussion] NumPy re-factoring project

Pauli Virtanen pav@iki...
Fri Jun 11 11:42:11 CDT 2010

Fri, 11 Jun 2010 15:31:45 +0200, Sturla Molden wrote:
>> The innermost dimension is handled via the ufunc loop, which is a
>> simple for loop with constant-size step and is given a number of
>> iterations. The array iterator objects are used only for stepping
>> through the outer dimensions. That is, it essentially steps through
>> your dtype** array, without explicitly constructing it.
> Yes, exactly my point. And because the iterator does not explicitely
> construct the array, it sucks for parallel programming (e.g. with
> OpenMP):
> - The iterator becomes a bottleneck to which access must be serialized
> with a mutex.
> - We cannot do proper work scheduling (load balancing)

I don't necessarily agree: you can do

    for parallelized outer loop {
        critical section {
            p = get iterator pointer
        inner loop in region `p`

This does allow load balancing etc., as a free processor can immediately 
grab the next available slice. Also, it would be easier to implement with 
OpenMP pragmas in the current code base.

Of course, the assumption here is that the outer iterator overhead is 
small compared to the duration of the inner loop. This must then be 
compared to the memory access overhead involved in the dtype** array.

Pauli Virtanen

More information about the NumPy-Discussion mailing list