[Numpy-discussion] newbie question - large dataset
Sat Apr 7 15:22:11 CDT 2007
On 4/7/07, Stefan van der Walt <email@example.com> wrote:
> On Sat, Apr 07, 2007 at 02:48:47PM -0400, Anne Archibald wrote:
> > If none of those algorithmic improvements are possible, you can look
> > at other possibilities for speeding things up (though the speedups
> > will be modest). Parallelism is an obvious one - if you've got a
> > multicore machine you may be able to cut your processing time by a
> > factor of the number of cores you have available with minimal effort
> > (for example by replacing a for loop with a simple foreach,
> > implemented as in the attached file).
> Would this code speed things up under Python? I was under the
> impression that there is only one process, irrespective of whether or
> not "threads" are used, and that the global interpreter lock is used
> when swapping between threads to make sure that only one executes at
> any instance in time.
You are correct. If g,h in the OP's description satisfy:
a) they are bloody expensive
b) they release the GIL internally via the proper C API calls, which
means they are promising not to modify any shared python objects
the pure python threads approach could help *somewhat*.
But yes, for this kind of distribution problem in python, a
multi-process approach is probably a better approach, if parallelism
is going to be used.
I suspect, however, that trying to lower the quadratic complexity of
the OP's formulation in the first place is probably a better idea.
Distribution lowers the constants, not the asymptotic behavior; as
Anne accurately pointed out, this is much more of an algorithmic
More information about the Numpy-discussion