[Numpy-discussion] numpy.random and multiprocessing

David Cournapeau cournape@gmail....
Thu Dec 11 12:31:47 CST 2008


On Fri, Dec 12, 2008 at 3:00 AM, Bruce Southey <bsouthey@gmail.com> wrote:
> David Cournapeau wrote:
>> Sturla Molden wrote:
>>
>>> On 12/11/2008 6:10 PM, Michael Gilbert wrote:
>>>
>>>
>>>
>>>> Shouldn't numpy (and/or multiprocessing) be smart enough to prevent
>>>> this kind of error?  A simple enough solution would be to also include
>>>> the process id as part of the seed
>>>>
>>>>
>>> It would not help, as the seeding is done prior to forking.
>>>
>>> I am mostly familiar with Windows programming. But what is needed is a
>>> fork handler (similar to a system hook in Windows jargon) that sets a
>>> new seed in the child process.
>>>
>>> Could pthread_atfork be used?
>>>
>>>
>>
>> The seed could be explicitly set in each task, no ?
>>
>> def task(x):
>>     np.random.seed()
>>     return np.random.random(x)
>>
>> But does this really make sense ?
>>
>> Is the goal to parallelize a big sampler into N tasks of M trials, to
>> produce the same result as a sequential set of M*N trials ? Then it does
>> sound like a trivial task at all. I know there exists libraries
>> explicitly designed for parallel random number generation - maybe this
>> is where we should look, instead of using heuristics which are likely to
>> be bogus, and generate wrong results.
>>
>> cheers,
>>
>> David
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>>
> This is not sufficient because you can not ensure that the seed will be
> different every time task() is called.

Yes, right. I was assuming that each seed call would result in a
/dev/urandom read - but the problem is the same whether it is done in
task or in a pthread_atfork method anyway.

> The
> only thing that Numpy could do is provide a parallel pseudo-random
> number generator.

Yes, exactly - hence my question whether this makes sense at all. Even
having different, "truely" random seeds does not guarantee that the
whole method makes sense - at least, I don't see why it should. In
particular, if the process should give the same result independently
of the number of parallels tasks, the problem becomes difficult.
Intrigued by the problem, I briefly looked into the literature for
parallel RNG; it certainly does not look like an easy task, and the
chance of getting it right without knowing about the topic does not
look high.

cheers,

David


More information about the Numpy-discussion mailing list