# [Numpy-discussion] Parallelizable Performance Python Example

James Snyder jbsnyder@fanplastic....
Tue Sep 22 12:03:14 CDT 2009

```Hi -

I've recently been trying to adjust the performance python example (http://www.scipy.org/PerformancePython
) so that it could be compared under a parallelized version.  I've
adjusted the Gauss-Seidel 4 point method to a red-black checkerboarded
is much more easily parallelized on a shared memory system.

I've got some examples of this working, but I seem to be having
trouble making it anywhere near efficient for the NumPy example (it 's
around an order of magnitude slower than the non-red-black version).

Here's essentially what I'm doing with the NumPy solver:

def numericTimeStep(self, dt=0.0):
"""
Takes a time step using a NumPy expression.
This has been adjusted to use checkerboard style indexing.
"""
g = self.grid
dx2, dy2 = np.float32(g.dx**2), np.float32(g.dy**2)
dnr_inv = np.float32(0.5/(dx2 + dy2))
u = g.u
g.old_u = u.copy() # needed to compute the error.

if self.count == 0:
# Precompute Matrix Indexes
X, Y = np.meshgrid(range(1,u.shape[0]-1),range(1,u.shape
[1]-1))
checker = (X+Y) % 2
self.idx1 = checker==1
self.idx2 = checker==0

# The actual iteration
g.u[1:-1, 1:-1][self.idx1] = ((g.u[0:-2, 1:-1][self.idx1] + g.u
[2:, 1:-1][self.idx1])*dy2 +
(g.u[1:-1,0:-2][self.idx1] + g.u[1:-1, 2:]
[self.idx1])*dx2)*dnr_inv

g.u[1:-1, 1:-1][self.idx2] = ((g.u[0:-2, 1:-1][self.idx2] + g.u
[2:, 1:-1][self.idx2])*dy2 +
(g.u[1:-1,0:-2][self.idx2] + g.u[1:-1, 2:]
[self.idx2])*dx2)*dnr_inv

return g.computeError()

Any ideas?  I presume that the double-indexing is maybe what's killing
this, and I could precompute some boolean indexing arrays, but the
original version of this solver (plain Gauss-Seidel, 4 point
averaging) is rather simple and clean :-):

def numericTimeStep(self, dt=0.0):
"""Takes a time step using a NumPy expression."""
g = self.grid
dx2, dy2 = g.dx**2, g.dy**2
dnr_inv = 0.5/(dx2 + dy2)
u = g.u
g.old_u = u.copy() # needed to compute the error.

# The actual iteration
u[1:-1, 1:-1] = ((u[0:-2, 1:-1] + u[2:, 1:-1])*dy2 +
(u[1:-1,0:-2] + u[1:-1, 2:])*dx2)*dnr_inv

return g.computeError()

Here's a pure python version of the red-black solver (which is, of
course, incredibly slow, but not that much slower than the non-red-
black version):
def slowTimeStep(self, dt=0.0):
"""Takes a time step using straight forward Python loops."""
g = self.grid
nx, ny = g.u.shape
dx2, dy2 = np.float32(g.dx**2), np.float32(g.dy**2)
dnr_inv = np.float32(0.5/(dx2 + dy2))
u = g.u

err = 0.0
for offset in range(0,2):
for i in range(1, nx-1):
for j in range(1 + (i + offset) % 2, ny-1, 2):
tmp = u[j,i]
u[j,i] = ((u[j-1, i] + u[j+1, i])*dx2 +
(u[j, i-1] + u[j, i+1])*dy2)*dnr_inv
diff = u[j,i] - tmp
err += diff*diff

return np.sqrt(err)

--
James Snyder
Biomedical Engineering
Northwestern University
jbsnyder@fanplastic.org
http://fanplastic.org/key.txt
ph: 847.448.0386
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4012 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090922/b5e1a41e/attachment.bin
```