[Numpy-discussion] numpy ufuncs and COREPY - any info?

Andrew Friedley afriedle@indiana....
Fri May 22 06:59:17 CDT 2009

Francesc Alted wrote:
> A Friday 22 May 2009 11:42:56 Gregor Thalhammer escrigué:
>> dmitrey schrieb:
>> 3) Improving performance by using multi cores is much more difficult.
>> Only for sufficiently large (>1e5) arrays a significant speedup is
>> possible. Where a speed gain is possible, the MKL uses several cores.
>> Some experimentation showed that adding a few OpenMP constructs you
>> could get a similar speedup with numpy.
>> 4) numpy.dot uses optimized implementations.
> Good points Gregor.  However, I wouldn't say that improving performance by 
> using multi cores is *that* difficult, but rather that multi cores can only be 
> used efficiently *whenever* the memory bandwith is not a limitation.  An 
> example of this is the computation of transcendental functions, where, even 
> using vectorized implementations, the computation speed is still CPU-bounded 
> in many cases.  And you have experimented yourself very good speed-ups for 
> these cases with your implementation of numexpr/MKL :)

Using multiple cores is pretty easy for element-wise ufuncs; no 
communication needs to occur and the work partitioning is trivial.  And 
actually I've found with some initial testing that multiple cores does 
still help when you are memory bound.  I don't fully understand why yet, 
though I have some ideas.  One reason is multiple memory controllers due 
to multiple sockets (ie opteron).  Another is that each thread is 
pulling memory from a different bank, utilizing more bandwidth than a 
single sequential thread could.  However if that's the case, we could 
possibly come up with code for a single thread that achieves (nearly) 
the same additional throughput..


More information about the Numpy-discussion mailing list