[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

David Cournapeau david@ar.media.kyoto-u.ac...
Sun Mar 23 22:37:27 CDT 2008

Gnata Xavier wrote:
> Ok I will try to see what I can do but it is sure that we do need the 
> plug-in system first (read "before the threads in the numpy release"). 
> During the devel of 1.1, I will try to find some time to understand 
> where I should put some pragma into ufunct using a very conservation 
> approach. Any people with some OpenMP knowledge are welcome because I'm 
> not a OpenMP expert but only an OpenMP user in my C/C++ codes.

Note that the plug-in idea is just my own idea, it is not something 
agreed by anyone else. So maybe it won't be done for numpy 1.1, or at 
all. It depends on the main maintainers of numpy.

> and the results :
> 10000000                80      10.308471       30.007250
> 1000000         160     1.902563        5.800172
> 100000          320     0.543008        1.123274
> 10000           640     0.206823        0.223031
> 1000            1280    0.088898        0.044268
> 100             2560    0.150429        0.008880
> 10              5120    0.289589        0.002084
>  ---> On this machine, we should start to use threads *in this testcase* 
> iif size>=10000 (a 100*100 image is a very very small one :))

Maybe openMP can be more clever, but it tends to show that openMP, when 
used naively, can *not* decide how many threads to use. That's really 
the core problem: again, I don't know much about openMP, but almost any 
project using multi-thread/multi-process and not being embarrassingly 
parallel has the problem that it makes things much slower for many cases 
where thread creation/management and co have a lot of overhead 
proportionally to the computation. The problem is to determine the N, 
dynamically, or in a way which works well for most cases. OpenMP was 
created for HPC, where you have very large data; it is not so obvious to 
me that it is adapted to numpy which has to be much more flexible. Being 
fast on a given problem is easy; being fast on a whole range, that's 
another story: the problem really is to be as fast as before on small 

The fact that matlab, while having much more ressources than us, took 
years to do it, makes me extremely skeptical on the efficient use of 
multi-threading without real benchmarks for numpy. They have a dedicated 
team, who developed a JIT for matlab, which "insert" multi-thread code 
on the fly (for m files, not when you are in the interpreter), and who 
uses multi-thread blas/lapack (which is already available in numpy 
depending on the blas/lapack you are using).

But again, and that's really the only thing I have to say: prove me wrong :)


More information about the Numpy-discussion mailing list