[Numpy-discussion] Assigning complex value to real array

Andrew P. Mullhaupt doc@zen-pharaohs....
Thu Oct 7 21:20:23 CDT 2010


  On 10/7/2010 6:52 PM, Matthew Brett wrote:
> Hi,
>
> On Thu, Oct 7, 2010 at 3:47 PM, Andrew P. Mullhaupt
> <doc@zen-pharaohs.com>  wrote:
>>> Most machines now
>>> and in the future are not going to choke on these issues (for a variety
>>> of reasons).
> 'Talk is cheap, show me the code' .

Yes, let's.

First we want to initialize memory mapped arrays which will be used for 
the variables. This python script does that:

from numpy import *

N = 819200

X = random.uniform(0, 1, 2*N)
Y = random.uniform(0, 1, 2*N)
Z = random.uniform(0, 1, 2*N)

p = random.uniform(0, 1, N)
q = random.uniform(0, 1, N)
r = random.uniform(0, 1, N)
s = random.uniform(0, 1, N)
w = random.uniform(0, 1, N)
z = random.uniform(0, 1, N)

save('X', X)
save('Y', Y)
save('Z', Z)
save('p', p)
save('q', q)
save('r', r)
save('s', s)
save('w', w)
save('z', z)


Note that we have two arrays of length 2*N representing complex arrays 
with interleaved real and imaginary parts, and we have four arrays of 
length N representing the segregated real and complex parts.

We also create arrays for the sums; these are not sparse (since we are 
concerned with the case where there actually are imaginary parts that 
would require memory bandwidth).

Since we want to check the memory bandwidth we need a test which 
essentially shovels these arrays through memory very quickly. I have 
chosen to produce a result to ensure that everything actually gets done 
- although this is probably not strictly necessary, why not.

First the interleaved version:

from numpy import *

X = load('X.npy', mmap_mode='r')
Y = load('Y.npy', mmap_mode='r')
Z = load('Z.npy', mmap_mode='r+')

N = Z.size / 2

its = int(8192*100000 / N)

for k in range(its) :
     Z[:] = X + Y

print Z.max()

Then the segregated version:

from numpy import *

p = load('p.npy', mmap_mode='r')
q = load('q.npy', mmap_mode='r')
r = load('r.npy', mmap_mode='r')
s = load('s.npy', mmap_mode='r')
w = load('w.npy', mmap_mode='r+')
z = load('z.npy', mmap_mode='r+')

N = s.size

its = int(8192*100000 / N)

for k in range(its) :
     w[:] = p + r
     z[:] = s + q

print maximum(w.max(), z.max())

Note that the segregated version does not "cheat" by computing  p + q  
and  r + s  which presumably span less address space. In fact, we would 
not expect that timing to be too much different.

Now lets conduct the interleaved timing:

andrew@flyer:/media/fraid/andrew/junk$ time python interleave.py
1.99960173329

real    0m6.438s
user    0m6.400s
sys    0m0.020s

and the segregated one:

andrew@flyer:/media/fraid/andrew/junk$ time python segregate.py
1.99858982651

real    0m6.198s
user    0m6.140s
sys    0m0.050s


Well OK, I would describe this as pretty close. Oddly enough the 
segregated version is faster.

Another pair of runs with larger files:

andrew@flyer:/media/fraid/andrew/junk$ time python interleave.py
1.99995091618

real    0m11.480s
user    0m6.070s
sys    0m4.400s

andrew@flyer:/media/fraid/andrew/junk$ time python segregate.py
1.99989270428

real    0m12.865s
user    0m7.250s
sys    0m4.330s

Here we end up with another close run, but the segregated  version is 
slower.

So yeah, here is code you can use to see if your machine chokes on the 
old sort of issues (which were very common in early RISC workstation 
architectures - that awful TLB on the RS/6000 series springs immediately 
to mind).

You can strobe through file sizes if you want to try and tickle one of 
those weirder creases in memory architecture on your machine, but you 
will probably not find anything that blatant. People who design 
computers have been through this rodeo a couple times.

Best regards,
Andrew Mullhaupt


More information about the NumPy-Discussion mailing list