# [Numpy-discussion] performance comparison of C++ vs Numeric (MA) operations.

Chris Barker chrishbarker at home.net
Wed Jun 13 14:24:42 CDT 2001

```If I read your C++ right (and I may not have, I'm a C++ novice), you
allocated the memory for all three arrays, and then performed your loop.
In the Python version, the result array is allocated when the
multiplication is perfomed, so you are allocating and freeing the result
array each tim ein the loop. That may slow things down a little. In a
real application, you are less likely to be re-doing the same
computation over and over again, so the allocation would happen only
once. You might try something like this, and see if it is any faster (it
is more memory efficient)

Note also that there is some overhead in function calls in Python, so
you may get some speed up if you inline the call to mult_test. You can
decide for yourself if this would still be a fair comparison.

You might try something like this, and see if it is any faster (it is
more memory efficient) (unfortunately, MA doesn't seem to support the
thiord argument to multiply)

My version (I don't have TimerUtility, so I used time.clock instead) got
these times:
completed 1000 in 99.050000 seconds
3.74e+06 checked multiplies/second

My code:
alternative completed 1000 in 80.070000 seconds
4.62e+06 checked multiplies/second

It did buy you something: here is the code:
#!/usr/bin/env python2.1
import sys

# test harness for Masked array performonce

#from MA import *
from Numeric import *

from time import clock

def mult_test(a1, a2):
res = a1 * a2

if __name__ == '__main__':
repeat = 100
gates = 1000
beams = 370

if len(sys.argv) > 1:
repeat = int(sys.argv[1])

t1 = ones((beams, gates), Float)
a1 = t1
a2 = t1

i = 0
start = clock()
while (i < repeat):
i = i+1
res = mult_test(a1, a2)

elapsed = clock() - start
print 'completed %d in %f seconds' % (repeat , elapsed)
cntMultiply = repeat*gates*beams
print '%8.3g checked multiplies/second' % (cntMultiply/elapsed)
print

# alternative:
res = zeros(a1.shape,Float)

i = 0
start = clock()
while (i < repeat):
i = i+1
multiply(a1, a2, res)

elapsed = clock() - start

print 'alternative completed %d in %f seconds' % (repeat , elapsed)
cntMultiply = repeat*gates*beams
print '%8.3g checked multiplies/second' % (cntMultiply/elapsed)
print

Another note: calling ones with Float as your type gives you a Python
float, which is a C double. Use 'f' or Float32 to get a C float. I've
found on Intel hardware, doubles are just as fast (the FPU used doubles
anyway), but they do use more memory, so this could make a difference.

-Chris

--
Christopher Barker,
Ph.D.
ChrisHBarker at home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------

```