[Numpy-discussion] masked ufuncs in C: on github
Eric Firing
efiring@hawaii....
Fri May 15 20:48:50 CDT 2009
http://www.mail-archive.com/numpy-discussion@scipy.org/msg17595.html
Prompted by the thread above, I decided to see what it would take to
implement ufuncs with masking in C. I described the result here:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg17698.html
Now I am starting a new thread. The present state of the work is now in
github: http://github.com/efiring/numpy-work/tree/cfastma
I don't want to do any more until I have gotten some feedback from core
developers. (And I would be delighted if someone wants to help with
this, or take it over.)
1) The strategy I have started with is to make a full set of masked
ufuncs alongside the existing ones, appending "_m" to their names. Only
the binary ufuncs are implemented now, but the unary ufuncs can be
handled similarly. Example:
multiply(x, y, out) # present ufunc: no change
multiply_m(x, y, mask, out) # new
Where mask is True, the operation is skipped.
2) I have in mind the possibility of supporting two input masks and one
output mask for binary operations. This would look like:
multiply_mm(x, y, maskx, masky, out, outmask)
outmask would be the logical_or of maskx and masky, and in the case of
domained operations it would also be True where the arguments are
outside the domain.
This form would provide the fastest support for masked arrays, but would
also take quite a bit more work, and would expand the namespace even
more. I'm not sure it's worth it.
3) I have not yet taken any steps to modify numpy.ma to take advantage
of the new ufuncs, but I think that will be quite simple.
4) Likewise, to save time, I am now just borrowing the regular ufunc
docstrings.
5) No tests yet, Stefan. They can be added as soon as there is
agreement on API and general strategy.
6) The present implementation is based on conceptually small
modifications of the existing numpy code generation system. It required
a lot of cut and paste, and yields a lot of nearly duplicated code.
There may be better ways to do it--especially if it turns out it needs
to be redone in some modified form.
Eric
More information about the Numpy-discussion
mailing list