[Numpy-discussion] Fast threading solution thoughts

Robert Kern robert.kern@gmail....
Wed Feb 11 23:52:40 CST 2009


On Wed, Feb 11, 2009 at 23:46, Brian Granger <ellisonbg.net@gmail.com> wrote:
> Hi,
>
> This is relevant for anyone who would like to speed up array based
> codes using threads.
>
> I have a simple loop that I have implemented using Cython:
>
> def backstep(np.ndarray opti, np.ndarray optf,
>             int istart, int iend, double p, double q):
>    cdef int j
>    cdef double *pi
>    cdef double *pf
>    pi = <double *>opti.data
>    pf = <double *>optf.data
>
>    with nogil:
>        for j in range(istart, iend):
>            pf[j] = (p*pi[j+1] + q*pi[j])
>
> I need to call this function *many* times and each time cannot be
> performed until the previous time is completely as there are data
> dependencies.  But, I still want to parallelize a single call to this
> function across multiple cores (notice that I am releasing the GIL
> before I do the heavy lifting).
>
> I want to break my loop range(istart,iend) into pieces and have a
> thread do each piece.  The arrays have sizes 10^3 to 10^5.
>
> Things I have tried:
>
> * Use pthreads and create new threads for each call to my function.
> This performed horribly due to the thread creation overhead.
> * Use a simple threadpool implementation in Python.  This performed
> horribly as well even though I was not recreating threads each call.
> The reason in this case was the time spent waiting on locks in the
> thread pool implementation (which is based on Queue and threading).
> This is either a problem with threading itself or a fundamental
> limitation of the pthreads library itself.
> * My next step is to implement a thread pool using pthreads/Cython.
> Assuming I do this right, this should be as fast as I can get using
> pthreads.
>
> The only tools that I know of that are *really* designed to handle
> these types of fine-grained parallel problems are:
>
> * Intel's TBB
> * OpenMP
> * Cilk++
> * ???
>
> This seem like pretty heavy solutions though.

>From a programmer's perspective, it seems to me like OpenMP is a muck
lighter weight solution than pthreads.

> So, do people have thoughts about ways of effectively using threads
> (not processes) for thin parallel loops over arrays?  This is relevant
> to Numpy itself as it would be very nice if all ufuncs could release
> the GIL and be run on multiple threads.

Eric Jones tried to do this with pthreads in C some time ago. His work is here:

  http://svn.scipy.org/svn/numpy/branches/multicore/

The lock overhead makes it usually not worthwhile.

And there's still an open problem for determining whether two strided
arrays overlap in memory.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the Numpy-discussion mailing list