[Numpy-discussion] MaskedArray __setitem__ Performance

Alexander Michael lxander.m@gmail....
Sat Feb 16 13:23:26 CST 2008


On Feb 16, 2008 12:25 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
> Alexander,
> You get the gist here: process your _data and _mask separately and recombine
> them into a MaskedArray at the end. That way, you'll skip most of the
> overhead costs brought by some tests in the package (in __getitem__,
> __setitem__...).

Can I safely carry around the data, mask and MaskedArray? I'm
considering working along the lines of the following conceptual
outline:

d = numpy.array(shape, dtype)
m = numpy.array(shape, bool)
a = numpy.ma.MaskedArray(d, m)

load_initial_data(d, m)

for update in updates:
    apply_update(update, d, m)
    result = calculate_result(a)

I guess the alternative would be like:

d = numpy.array(shape, dtype)
m = numpy.array(shape, bool)

load_initial_data(d, m)

for update in updates:
    apply_update(update, d, m)
    a = numpy.ma.MaskedArray(d, m)
    result = calculate_result(a)

Perhaps this is cleaner in some ways, but I'm trying to squeeze the
most performance out of the basic update loop I've sketched, so that
the calculate_result function can afford to exchange some performance
for clarity and simplicity (if desired). I haven't yet measured the
overhead in creating a MaskedArray, but there probably isn't much
since by default no copies are made.

Thanks for your advice,
Alex


More information about the Numpy-discussion mailing list