[Numpy-discussion] numpy.random and multiprocessing

David Cournapeau cournape@gmail....
Thu Dec 11 12:31:47 CST 2008

On Fri, Dec 12, 2008 at 3:00 AM, Bruce Southey <bsouthey@gmail.com> wrote:
> David Cournapeau wrote:
>> Sturla Molden wrote:
>>> On 12/11/2008 6:10 PM, Michael Gilbert wrote:
>>>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent
>>>> this kind of error?  A simple enough solution would be to also include
>>>> the process id as part of the seed
>>> It would not help, as the seeding is done prior to forking.
>>> I am mostly familiar with Windows programming. But what is needed is a
>>> fork handler (similar to a system hook in Windows jargon) that sets a
>>> new seed in the child process.
>>> Could pthread_atfork be used?
>> The seed could be explicitly set in each task, no ?
>> def task(x):
>>     np.random.seed()
>>     return np.random.random(x)
>> But does this really make sense ?
>> Is the goal to parallelize a big sampler into N tasks of M trials, to
>> produce the same result as a sequential set of M*N trials ? Then it does
>> sound like a trivial task at all. I know there exists libraries
>> explicitly designed for parallel random number generation - maybe this
>> is where we should look, instead of using heuristics which are likely to
>> be bogus, and generate wrong results.
>> cheers,
>> David
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> This is not sufficient because you can not ensure that the seed will be
> different every time task() is called.

Yes, right. I was assuming that each seed call would result in a
/dev/urandom read - but the problem is the same whether it is done in
task or in a pthread_atfork method anyway.

> The
> only thing that Numpy could do is provide a parallel pseudo-random
> number generator.

Yes, exactly - hence my question whether this makes sense at all. Even
having different, "truely" random seeds does not guarantee that the
whole method makes sense - at least, I don't see why it should. In
particular, if the process should give the same result independently
of the number of parallels tasks, the problem becomes difficult.
Intrigued by the problem, I briefly looked into the literature for
parallel RNG; it certainly does not look like an easy task, and the
chance of getting it right without knowing about the topic does not
look high.



More information about the Numpy-discussion mailing list