[Numpy-discussion] Fast threading solution thoughts
Francesc Alted
faltet@pytables....
Thu Feb 12 04:05:54 CST 2009
Hi Brian,
A Thursday 12 February 2009, Brian Granger escrigué:
> Hi,
>
> This is relevant for anyone who would like to speed up array based
> codes using threads.
>
> I have a simple loop that I have implemented using Cython:
>
> def backstep(np.ndarray opti, np.ndarray optf,
> int istart, int iend, double p, double q):
> cdef int j
> cdef double *pi
> cdef double *pf
> pi = <double *>opti.data
> pf = <double *>optf.data
>
> with nogil:
> for j in range(istart, iend):
> pf[j] = (p*pi[j+1] + q*pi[j])
>
> I need to call this function *many* times and each time cannot be
> performed until the previous time is completely as there are data
> dependencies. But, I still want to parallelize a single call to this
> function across multiple cores (notice that I am releasing the GIL
> before I do the heavy lifting).
>
> I want to break my loop range(istart,iend) into pieces and have a
> thread do each piece. The arrays have sizes 10^3 to 10^5.
>
> Things I have tried:
[clip]
If your problem is evaluating vector expressions just like the above
(i.e. without using transcendental functions like sin, exp, etc...),
usually the bottleneck is on memory access, so using several threads is
simply not going to help you achieving better performance, but rather
the contrary (you have to deal with the additional thread overhead).
So, frankly, I would not waste more time trying to paralelize that.
As an example, in the recent support of VML in numexpr we have disabled
the use of VML (as well as the OpenMP threading support that comes with
it) in cases like yours, where only additions and multiplications are
performed (these operations are very fast in modern processors, and the
sole bottleneck for this case is the memory bandwidth, as I've said).
However, in case of expressions containing operations like division or
transcendental functions, then VML activates automatically, and you can
make use of several cores if you want. So, if you are in this case,
and you have access to Intel MKL (the library that contains VML), you
may want to give numexpr a try.
HTH,
--
Francesc Alted
More information about the Numpy-discussion
mailing list