[Numpy-discussion] How to median filter a masked array?
Russell E Owen
rowen at u.washington.edu
Wed Jul 14 10:44:39 CDT 2004
At 9:50 AM -0700 2004-07-14, Paul F. Dubois wrote:
>The median filter is prepared to take an argument of a numarray
>array but ignorant of and unprepared to deal with masked values.
>Using the __array__ trick, both Numeric.MA and numarray.ma would
>'know' this and therefore replace the missing values in the filter's
>argument with the 'fill value' for that type -- a big number in the
>case of real arrays. You could explicitly choose that value (say
>using the overall median of the data m) by passing x.filled(m)
>rather than x to the filter.
>
>If there is no such value, you probably do have to do it in C. If
>you wrote it in C, how would you treat missing elements? BTW it
>wouldn't be that hard; just pass both the array and its mask as
>separate elements to a C routine and use SWIG to hook it up.
I already have routines that handle masked data in C to create a
radial profiles from 2-d integer data (since I could not figure out
how to do that in numarray). I chose to pass the mask as a separate
array, since I could not find any C interface for numarray.ma and
since NaN made no sense for integer data.
That code was pretty straightforward. I wish I could have found a
simple way to support multiple array types. I thought using C++ with
prototypes would be the ticket, but absent any examples and after
looking through the numarray code, I gave up and took the easy way
out. (I didn't use SWIG, though, I just hand coded everything. Maybe
that was a mistake.)
I confess that makes me worry about the underpinnings of numarray. It
seems an obvious candidate to be written in C++ with prototypes. I
hate to think what the developers have to go through, instead.
In any case, writing a median filter is a bigger deal than taking a
radial profile, and since one already existed I thought I'd ask.
>I doubt NaN would help you here; you'd still have to figure out what
>to do in those places. Numeric did not have support for NaN because
>there were portability problems. Probably still are. And you still
>are stuck in a lot of cases anyway.
Well, NaN isn't very general in any case, since it's meaningless for
integer data. So maybe that's a red herring. (Though if NaN had
worked to mask data I would cheerfully have converted my images to
floats to take advantage of it!).
What's really wanted is a more unified approach to masked data. I
suppose it's pie in the sky, but I sure wish most the numarray
functions took an optional mask array (or accepted a numarray.ma
object -- nice for the user, but probably too painful for words under
the hood).
I don't think there are major issues with what to do with masked
data. Simply ignoring it works in most cases, e.g. mean, std dev,
sum, max... In some cases one needs the new mask as output (e.g.
matrix multiply). Filtering is a bit subtle: can masked data be
treated the same as data off the edge? I hope so, but I'm not sure.
Anyway, I am grateful for what we do have. Without Numeric or
numarray I would have to write all my image processing code in a
different language.
-- Russell
More information about the Numpy-discussion
mailing list