[Numpy-discussion] numpy.random and multiprocessing
Thu Dec 11 10:55:58 CST 2008
On 12/11/2008 5:39 PM, Gael Varoquaux wrote:
>>> Why do you say the results are the same ? They don't look the same to
>>> me - only the first three are the same.
>> He used the multiprocessing.Pool object. There is a possible race
>> condition here: one or more of the forked processes may be doing
>> nothing. They are all competing for tasks on a queue. It could be
>> avoided by using multiprocessing.Process instead.
> No, Pool is what I want, because in my production code I am submitting
> jobs to that pool.
Sure, a pool is fine. I was just speculating that one of the four
processes in your pool was idle all the time; i.e. that one of the other
three got to do the task twice. Therefore you only got three identical
results and not four. It depends on how the OS schedules the processes,
the number of logical CPUs, etc. You have no control over that. But if
you had used N instances of multiprocessing.Pool instead, all N results
should have been identical (if the 'random' generator is completely
deterministic) - because each process would do the task once.
I.e. you only got three indentical results due to a race condition in
the task queue.
But you don't want similar results do you? So if you remember to seed
the random number generators after forking, this race condition should
be of no significance.
> mtrand.pyx seems pretty clear about that: on import.
In which case they are initialized prior to forking.
More information about the Numpy-discussion