[Numpy-discussion] Parallelize repeated operations
Thu Jun 23 15:08:03 CDT 2011
I have found a very strange bug that I cannot understand. I would like to do something like this:
Given 4 pairs of numpy arrays (x1, y1, x2, y2, x3, y3, x4, y4), I would like to compute each corresponding inner product ip = np.dot(yi.T, xj), for i = 1,...4 and j = 1,...,4. (Something like a correlation matrix.)
What I did was use shared memory and the multiprocessing module (with pool) to load the data in parallel. Each processor loads one pair of the snapshots, so I can do 4 simultaneous loads on my four-core machine, and it takes two cycles of loads to get all the data in memory.
Then I tried to do the inner products in parallel as well, asking each processors to do 4 of the 16 total inner products. As it turned out, this was slower in parallel than in serial!!!
This only occurs when I use numpy functions. If instead I replace the inner product task by printing to stdout or something of that sort, I get the 4x speedup that I expect. Any ideas?
More information about the NumPy-Discussion