[Numpy-discussion] multiprocessing shared arrays and numpy

Gael Varoquaux gael.varoquaux@normalesup....
Fri Mar 5 07:46:00 CST 2010


On Fri, Mar 05, 2010 at 08:14:51AM -0500, Francesc Alted wrote:
> > FWIW, I observe very good speedups on my problems (pretty much linear in
> > the number of CPUs), and I have data parallel problems on fairly large
> > data (~100Mo a piece, doesn't fit in cache), with no synchronisation at
> > all between the workers. CPUs are Intel Xeons.

> Maybe your processes are not as memory-bound as you think. 

That's the only explaination that I can think of. I have two types of
bottlenecks. One is blas level 3 operations (mainly SVDs) on large
matrices, the second is resampling, where are repeat the same operation
many times over almost the same chunk of data. In both cases the data is
fairly large, so I expected the operations to be memory bound.

However, thinking of it, I believe that when I had timed these operations
carefully, it seems that processes were alternating a starving period,
during which they were IO-bound, and a productive period, during which
they were CPU-bound. After a few cycles, the different periods would fall
in a mutually disynchronised alternation, with one process IO-bound, and
the others CPU-bound, and it would become fairly efficient. Of course,
this is possible because I have no cross-talk between the processes.

> Do you get much better speed-up by using NUMA than a simple multi-core
> machine with one single path to memory?  I don't think so, but maybe
> I'm wrong here.

I don't know. All the boxes around here have Intel CPUs, and I believe
that this is all SMPs.

Gaël


More information about the NumPy-Discussion mailing list