[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
Sun Mar 23 09:19:11 CDT 2008
David Cournapeau wrote:
> Francesc Altet wrote:
>> Why not? IMHO, complex operations requiring a great deal of operations
>> per word, like trigonometric, exponential, etc..., are the best
>> candidates to take advantage of several cores or even SSE instructions
>> (not sure whether SSE supports this sort of operations, though).
> I was talking about the general "using openmp" thing in numpy context.
> If it was just adding one line at one place in the source code, someone
> would already have done it, no ? But there are build issues, for
> example: you have to add support for openmp at compilation and link, you
> have to make sure it works with compilers which do not support it.
> Even without taking into account the build issues, there is the problem
> of correctly annotating the source code depending on the context. For
> example, many interesting places where to use openmp in numpy would need
> more than just using the "parallel for" pragma. From what I know of
> openMP, the annotations may depend on the kind of operation you are
> doing (independent element-wise operations or not). Also, the test case
> posted before use a really big N, where you are sure that using
> multi-thread is efficient. What happens if N is small ? Basically, the
> posted test is the best situation which can happen (big N, known
> operation with known context, etc...). That's a proof that openMP works,
> not that it can work for numpy.
> I find the example of sse rather enlightening: in theory, you should
> expect a 100-300 % speed increase using sse, but even with pure C code
> in a controlled manner, on one platform (linux + gcc), with varying,
> recent CPU, the results are fundamentally different. So what would
> happen in numpy, where you don't control things that much ?
Well of course my goal was not to say that my simple testcase can be
copied/pasted into numpy :)
Of ourse it is one of the best case to use openmp.
Of course pragma can be more complex than that (you can tell variables
that can/cannot be shared for instance).
The size : Using openmp will be slower on small arrays, that is clear
but the user doing very large computations is smart enough to know when
he need to split it's job into threads. The obvious solution is to
provide the user with // and non // functions.
sse : sse can help a lot but multithreading just scales where sse
mono-thread based solutions don't.
Build/link : It is an issue. It has to be tested. I do not know because
I haven't even tried.
So, IMHO it would be nice to try to put some OpenMP simple pragmas into
numpy *only to see what is going on*.
Even if it only work with gcc or even if...I do not know... It would be
a first step. step by step :)
If the performances are so bad, ok, forget about it....but it would be
sad because the next generation CPU will not be more powerfull, they
will "only" have more that one or two cores on the same chip.
More information about the Numpy-discussion