[Numpy-discussion] Parallelize repeated operations

Jonathan Tu jhtu@Princeton....
Thu Jun 23 15:08:03 CDT 2011


I have found a very strange bug that I cannot understand.  I would like to do something like this:

Given 4 pairs of numpy arrays (x1, y1, x2, y2, x3, y3, x4, y4), I would like to compute each corresponding inner product ip = np.dot(yi.T, xj), for i = 1,...4 and j = 1,...,4.  (Something like a correlation matrix.)

What I did was use shared memory and the multiprocessing module (with pool) to load the data in parallel. Each processor loads one pair of the snapshots, so I can do 4 simultaneous loads on my four-core machine, and it takes two cycles of loads to get all the data in memory.

Then I tried to do the inner products in parallel as well, asking each processors to do 4 of the 16 total inner products.  As it turned out, this was slower in parallel than in serial!!!

This only occurs when I use numpy functions.  If instead I replace the inner product task by printing to stdout or something of that sort, I get the 4x speedup that I expect.  Any ideas?

Jonathan Tu

More information about the NumPy-Discussion mailing list