[Numpy-discussion] numpy.random and multiprocessing

Sturla Molden sturla@molden...
Thu Dec 11 10:55:58 CST 2008


On 12/11/2008 5:39 PM, Gael Varoquaux wrote:

>>> Why do you say the results are the same ? They don't look the same to
>>> me - only the first three are the same.
> 
>> He used the multiprocessing.Pool object. There is a possible race 
>> condition here: one or more of the forked processes may be doing 
>> nothing. They are all competing for tasks on a queue. It could be 
>> avoided by using multiprocessing.Process instead.
> 
> No, Pool is what I want, because in my production code I am submitting
> jobs to that pool.

Sure, a pool is fine. I was just speculating that one of the four 
processes in your pool was idle all the time; i.e. that one of the other 
three got to do the task twice. Therefore you only got three identical 
results and not four. It depends on how the OS schedules the processes, 
the number of logical CPUs, etc. You have no control over that. But if 
you had used N instances of multiprocessing.Pool instead, all N results 
should have been identical (if the 'random' generator is completely 
deterministic) - because each process would do the task once.

I.e. you only got three indentical results due to a race condition in 
the task queue.

But you don't want similar results do you? So if you remember to seed 
the random number generators after forking, this race condition should 
be of no significance.


> mtrand.pyx seems pretty clear about that: on import.

In which case they are initialized prior to forking.



Sturla Molden





More information about the Numpy-discussion mailing list