[Numpy-discussion] Fast threading solution thoughts
Thu Feb 12 13:40:53 CST 2009
> If your problem is evaluating vector expressions just like the above
> (i.e. without using transcendental functions like sin, exp, etc...),
> usually the bottleneck is on memory access, so using several threads is
> simply not going to help you achieving better performance, but rather
> the contrary (you have to deal with the additional thread overhead).
> So, frankly, I would not waste more time trying to paralelize that.
I had a feeling this would be the case, I just haven't been sure about
what point this comes into play. I really need to do some tests to
understand exactly how CPU load and memory bandwidth interplay in
these situations. I have worked with GPUs before and often the reason
the GPU is faster than the CPU is simply the higher memory bandwidth.
> As an example, in the recent support of VML in numexpr we have disabled
> the use of VML (as well as the OpenMP threading support that comes with
> it) in cases like yours, where only additions and multiplications are
> performed (these operations are very fast in modern processors, and the
> sole bottleneck for this case is the memory bandwidth, as I've said).
> However, in case of expressions containing operations like division or
> transcendental functions, then VML activates automatically, and you can
> make use of several cores if you want. So, if you are in this case,
> and you have access to Intel MKL (the library that contains VML), you
> may want to give numexpr a try.
OK, this is very interesting indeed. I didn't know that numexpr has
support for VML, which has openmp support. I will definitely have
look at this. Thanks!
> Francesc Alted
> Numpy-discussion mailing list
More information about the Numpy-discussion