[Numpy-discussion] MaskedArray __setitem__ Performance
Alexander Michael
lxander.m@gmail....
Fri Feb 15 22:12:37 CST 2008
In part of some code I'm rewriting from carrying around a data and
mask array to using MaskedArray, I read data into an array from an
input stream. By its nature this a "one at a time" process, so it is
basically a loop over assigning single elements (in no predetermined
order) of already allocated arrays. Unfortunately, using MaskedArray
in this way is significantly slower. The sample code below
demonstrates that for this particular procedure, filling the
MaskedArray is 32x slower than working with the two arrays I had been
carying around. It appears that I can regain the fill performance by
working on _data and _mask directly. I can guarantee that the
MaskedArrays I'm working with have been created with a dense mask as
I've done below (there are always some masked elements, so there is no
gain in shrinking to nomask). Is this safe? If not, can I make it safe
for this particular performance critical section? I'm assuming that
doing array operations won't incur this sort of penalty when I get
further into my translation. Some overhead is acceptable for the
convenience of not dragging around the mask and thinking about it all
of the time, but hopefully less than 2x slower.
Thanks!
Alex
import numpy
def get_ndarrays():
return (numpy.zeros((5000,500), dtype=float),
numpy.ones((5000,500), dtype=bool))
import timeit
t_base = timeit.Timer(
'a[0,0] = 1.0; m[0,0] = False', 'from __main__ import
get_ndarrays; a,m = get_ndarrays()'
).timeit(1000)/1000
print t_base
6.97574691756e-007
import numpy.ma
def get_maskedarray():
return numpy.ma.MaskedArray(
numpy.zeros((5000,500), dtype=float),
numpy.ones((5000,500), dtype=bool)
)
t_ma = timeit.Timer(
'a[0,0] = 1.0', 'from __main__ import get_maskedarray; a =
get_maskedarray()'
).timeit(1000)/1000
print t_ma, t_ma/t_base
2.26880790715e-005 32.5242290749
t_ma_com = timeit.Timer(
'd[0,0] = 1.0; m[0,0] = False', 'from __main__ import
get_maskedarray, get_setter; a = get_maskedarray(); d,m =
a._data,a._mask'
).timeit(1000)/1000
print t_ma_com, t_ma_com/t_base
7.34450886914e-007 1.05286343612
More information about the Numpy-discussion
mailing list