[Numpy-discussion] multiprocessing shared arrays and numpy

Francesc Alted faltet@pytables....
Fri Mar 5 09:22:07 CST 2010


A Friday 05 March 2010 14:46:00 Gael Varoquaux escrigué:
> On Fri, Mar 05, 2010 at 08:14:51AM -0500, Francesc Alted wrote:
> > > FWIW, I observe very good speedups on my problems (pretty much linear
> > > in the number of CPUs), and I have data parallel problems on fairly
> > > large data (~100Mo a piece, doesn't fit in cache), with no
> > > synchronisation at all between the workers. CPUs are Intel Xeons.
> >
> > Maybe your processes are not as memory-bound as you think.
> 
> That's the only explaination that I can think of. I have two types of
> bottlenecks. One is blas level 3 operations (mainly SVDs) on large
> matrices, the second is resampling, where are repeat the same operation
> many times over almost the same chunk of data. In both cases the data is
> fairly large, so I expected the operations to be memory bound.

Not at all.  BLAS 3 operations are mainly CPU-bounded, because algorithms (if 
they are correctly implemented, of course, but any decent BLAS 3 library will 
do) have many chances to reuse data from caches.  BLAS 1 (and lately 2 too) 
are the ones that are memory-bound.

And in your second case, you are repeating the same operation over the same 
chunk of data.  If this chunk is small enough to fit in cache, then the 
bottleneck is CPU again (and probably access to L1/L2 cache), and not access 
to memory.  But if, as you said, you are seeing periods that are memory-
bounded (i.e. CPUs are starving), then it may well be that this chunksize does 
not fit well in cache, and then your problem is memory access for this case.  
Maybe you can get better performance by reducing your chunksize so that it 
fits in cache (L1 or L2).

So, I do not think that NUMA architectures would perform your current 
computations any better than your current SMP platform (and you know that NUMA 
architectures are much more complex and expensive than SMP ones).  But 
experimenting is *always* the best answer to these hairy questions ;-)

-- 
Francesc Alted


More information about the NumPy-Discussion mailing list