[Numpy-discussion] numpy.random and multiprocessing
Thu Dec 11 12:00:23 CST 2008
David Cournapeau wrote:
> Sturla Molden wrote:
>> On 12/11/2008 6:10 PM, Michael Gilbert wrote:
>>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent
>>> this kind of error? A simple enough solution would be to also include
>>> the process id as part of the seed
>> It would not help, as the seeding is done prior to forking.
>> I am mostly familiar with Windows programming. But what is needed is a
>> fork handler (similar to a system hook in Windows jargon) that sets a
>> new seed in the child process.
>> Could pthread_atfork be used?
> The seed could be explicitly set in each task, no ?
> def task(x):
> return np.random.random(x)
> But does this really make sense ?
> Is the goal to parallelize a big sampler into N tasks of M trials, to
> produce the same result as a sequential set of M*N trials ? Then it does
> sound like a trivial task at all. I know there exists libraries
> explicitly designed for parallel random number generation - maybe this
> is where we should look, instead of using heuristics which are likely to
> be bogus, and generate wrong results.
> Numpy-discussion mailing list
This is not sufficient because you can not ensure that the seed will be
different every time task() is called.
A major part of the problem here is treating a parallel computing
problem as a serial computing problem. The streams must be independent
across threads especially avoiding cross-correlation of streams (another
gotcha) between threads. It is up to the user to implement a
thread-safe solution such as using a single stream that is used by all
threads or force the different threads to start at different states. The
only thing that Numpy could do is provide a parallel pseudo-random
More information about the Numpy-discussion