[Numpy-discussion] Are masked arrays slower for processing than ndarrays?
Sat May 9 19:37:40 CDT 2009
On May 9, 2009, at 8:17 PM, Eric Firing wrote:
> Eric Firing wrote:
> A part of the slowdown is what looks to me like unnecessary copying
> in _MaskedBinaryOperation.__call__. It is using getdata, which
> applies numpy.array to its input, forcing a copy. I think the copy
> is actually unintentional, in at least one sense, and possibly two:
> first, because the default argument of getattr is always evaluated,
> even if it is not needed; and second, because the call to np.array
> is used where np.asarray or equivalent would suffice.
Yep, good call. the try/except should be better, and yes, I forgot to
force copy=False (thought it was on by default...). I didn't know that
getattr always evaluated the default, the docs are scarce on that
> ... I pressed "send" too soon. There are test failures with the
> patch I attached to my last message. I think the basic ideas are
> correct, but evidently there are wrinkles to be worked out. Maybe
> putmask() has to be used instead of where() (putmask is much faster)
> to maintain the ability to do *= and similar, and maybe there are
> other adjustments. Somehow, though, it should be possible to get
> decent speed for simple multiplication and division; a 10x penalty
> relative to ndarray operations is just too much.
Quite agreed. It was a shock to realize that we were that slow. I
gonna have to start testing w/ large arrays...
I'm confident we can significantly speed up the _MaskedOperations
without losing any of the features. Yes, putmask may be a better
option. We could probably use the following MO:
* result = a.data/b.data
* putmask(result, m, a)
However, I gonna need a good couple of weeks before being able to
really look into it...
More information about the Numpy-discussion