[Numpy-discussion] performance comparison of C++ vs Numeric (MA) operations.
Chris Barker
chrishbarker at home.net
Wed Jun 13 14:24:42 CDT 2001
If I read your C++ right (and I may not have, I'm a C++ novice), you
allocated the memory for all three arrays, and then performed your loop.
In the Python version, the result array is allocated when the
multiplication is perfomed, so you are allocating and freeing the result
array each tim ein the loop. That may slow things down a little. In a
real application, you are less likely to be re-doing the same
computation over and over again, so the allocation would happen only
once. You might try something like this, and see if it is any faster (it
is more memory efficient)
Note also that there is some overhead in function calls in Python, so
you may get some speed up if you inline the call to mult_test. You can
decide for yourself if this would still be a fair comparison.
You might try something like this, and see if it is any faster (it is
more memory efficient) (unfortunately, MA doesn't seem to support the
thiord argument to multiply)
My version (I don't have TimerUtility, so I used time.clock instead) got
these times:
Your code:
completed 1000 in 99.050000 seconds
3.74e+06 checked multiplies/second
My code:
alternative completed 1000 in 80.070000 seconds
4.62e+06 checked multiplies/second
It did buy you something: here is the code:
#!/usr/bin/env python2.1
import sys
# test harness for Masked array performonce
#from MA import *
from Numeric import *
from time import clock
def mult_test(a1, a2):
res = a1 * a2
if __name__ == '__main__':
repeat = 100
gates = 1000
beams = 370
if len(sys.argv) > 1:
repeat = int(sys.argv[1])
t1 = ones((beams, gates), Float)
a1 = t1
a2 = t1
# a1 = masked_values(t1, -327.68)
# a2 = masked_values(t1, -327.68)
i = 0
start = clock()
while (i < repeat):
i = i+1
res = mult_test(a1, a2)
elapsed = clock() - start
print 'completed %d in %f seconds' % (repeat , elapsed)
cntMultiply = repeat*gates*beams
print '%8.3g checked multiplies/second' % (cntMultiply/elapsed)
print
# alternative:
res = zeros(a1.shape,Float)
i = 0
start = clock()
while (i < repeat):
i = i+1
multiply(a1, a2, res)
elapsed = clock() - start
print 'alternative completed %d in %f seconds' % (repeat , elapsed)
cntMultiply = repeat*gates*beams
print '%8.3g checked multiplies/second' % (cntMultiply/elapsed)
print
Another note: calling ones with Float as your type gives you a Python
float, which is a C double. Use 'f' or Float32 to get a C float. I've
found on Intel hardware, doubles are just as fast (the FPU used doubles
anyway), but they do use more memory, so this could make a difference.
-Chris
--
Christopher Barker,
Ph.D.
ChrisHBarker at home.net --- --- ---
http://members.home.net/barkerlohmann ---@@ -----@@ -----@@
------@@@ ------@@@ ------@@@
Oil Spill Modeling ------ @ ------ @ ------ @
Water Resources Engineering ------- --------- --------
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------
More information about the Numpy-discussion
mailing list