[Numpy-discussion] Question about Optimization (Inline, and Pyrex)
Tue Apr 17 15:02:24 CDT 2007
On 17/04/07, James Turner <email@example.com> wrote:
> Hi Anne,
> Your reply to Lou raises a naive follow-up question of my own...
> > Normally, python's multithreading is effectively cooperative, because
> > the interpreter's data structures are all stored under the same lock,
> > so only one thread can be executing python bytecode at a time.
> > However, many of numpy's vectorized functions release the lock while
> > running, so on a multiprocessor or multicore machine you can have
> > several cores at once running vectorized code.
> Are you saying that numpy's vectorized functions will perform a single
> array operation in parallel on a multi-processor machine, or just that
> the user can explicitly write threaded code to run *multiple* array
> operations on different processors at the same time? I hope that's not
> too stupid a question, but I haven't done any threaded programming yet
> and the answer could be rather useful...
For the most part, numpy's vectorized functions don't do anything
fancy in terms of computations; just giant for loops. What they do do
(and not necessarily all of them) is release the GIL so another thread
can be doing something else while they do that. That said, some of
them (dot for example) use BLAS in certain situations, and then all
bets are off. At the least a decent BLAS implementation will be smart
about cache behaviour; a fancy BLAS implementation might actually
vectorize the operation automatically. That would be using SSE3,
though, or some vector processor (Cray?), not likely SMP. Though I
can't say for sure. The scipy linear algebra functions use LAPACK,
which is more likely to be able to make such speedups (and in fact I'm
pretty sure there is an MPI-based LAPACK, though whether it's a
plug-in replacement I don't know).
More information about the Numpy-discussion