[Numpy-discussion] numpy ufuncs and COREPY - any info?
Andrew Friedley
afriedle@indiana....
Fri May 22 06:59:17 CDT 2009
Francesc Alted wrote:
> A Friday 22 May 2009 11:42:56 Gregor Thalhammer escrigué:
>> dmitrey schrieb:
>> 3) Improving performance by using multi cores is much more difficult.
>> Only for sufficiently large (>1e5) arrays a significant speedup is
>> possible. Where a speed gain is possible, the MKL uses several cores.
>> Some experimentation showed that adding a few OpenMP constructs you
>> could get a similar speedup with numpy.
>> 4) numpy.dot uses optimized implementations.
>
> Good points Gregor. However, I wouldn't say that improving performance by
> using multi cores is *that* difficult, but rather that multi cores can only be
> used efficiently *whenever* the memory bandwith is not a limitation. An
> example of this is the computation of transcendental functions, where, even
> using vectorized implementations, the computation speed is still CPU-bounded
> in many cases. And you have experimented yourself very good speed-ups for
> these cases with your implementation of numexpr/MKL :)
Using multiple cores is pretty easy for element-wise ufuncs; no
communication needs to occur and the work partitioning is trivial. And
actually I've found with some initial testing that multiple cores does
still help when you are memory bound. I don't fully understand why yet,
though I have some ideas. One reason is multiple memory controllers due
to multiple sockets (ie opteron). Another is that each thread is
pulling memory from a different bank, utilizing more bandwidth than a
single sequential thread could. However if that's the case, we could
possibly come up with code for a single thread that achieves (nearly)
the same additional throughput..
Andrew
