[Numpy-discussion] NumPy re-factoring project

Sturla Molden sturla@molden...
Fri Jun 11 08:31:45 CDT 2010

Den 11.06.2010 10:17, skrev Pauli Virtanen:
>> 1. Collect an array of pointers to each subarray (e.g. using
>> std::vector<dtype*>  or dtype**)
>> 2. Dispatch on the pointer array...
> This is actually what the current ufunc code does.
> The innermost dimension is handled via the ufunc loop, which is a simple
> for loop with constant-size step and is given a number of iterations. The
> array iterator objects are used only for stepping through the outer
> dimensions. That is, it essentially steps through your dtype** array,
> without explicitly constructing it.

Yes, exactly my point. And because the iterator does not explicitely 
construct the array, it sucks for parallel programming (e.g. with OpenMP):

- The iterator becomes a bottleneck to which access must be serialized 
with a mutex.
- We cannot do proper work scheduling (load balancing)

> We could in principle use OpenMP to parallelize the outer loop, as long
> as the inner ufunc part is implemented in C, and does not go back into
> Python.

Yes. And for that we need to explicitely construct a pointer array.

> But if the problem is memory-bound, it is not clear that
> parallelization helps.

That is often the case, yes.

Memory access patterns is one of the most important issues speed wise, 
particularly when working with strided array slices.

It would also make sence to evaluate expressions like "y = b*x + a" 
without a temporary array for b*x. I know roughly how to do it, but 
don't have time to look at it before next year. (Yes I know about 
numexpr, I am talking about plain Python code.)


More information about the NumPy-Discussion mailing list