[Numpy-discussion] How to median filter a masked array?

Russell E Owen rowen at u.washington.edu
Wed Jul 14 10:44:39 CDT 2004

At 9:50 AM -0700 2004-07-14, Paul F. Dubois wrote:
>The median filter is prepared to take an argument of a numarray 
>array but ignorant of and unprepared to deal with masked  values. 
>Using the __array__ trick, both Numeric.MA and numarray.ma would 
>'know' this and therefore replace the missing values in the filter's 
>argument with the 'fill value' for that type -- a big number in the 
>case of real arrays. You could explicitly choose that value (say 
>using the overall median of the data m) by passing x.filled(m) 
>rather than x to the filter.
>If there is no such value, you probably do have to do it in C. If 
>you wrote it in C, how would you treat missing elements? BTW it 
>wouldn't be that hard; just pass both the array and its mask as 
>separate elements to a C routine and use SWIG to hook it up.

I already have routines that handle masked data in C to create a 
radial profiles from 2-d integer data (since I could not figure out 
how to do that in numarray). I chose to pass the mask as a separate 
array, since I could not find any C interface for numarray.ma and 
since NaN made no sense for integer data.

That code was pretty straightforward. I wish I could have found a 
simple way to support multiple array types. I thought using C++ with 
prototypes would be the ticket, but absent any examples and after 
looking through the numarray code, I gave up and took the easy way 
out. (I didn't use SWIG, though, I just hand coded everything. Maybe 
that was a mistake.)

I confess that makes me worry about the underpinnings of numarray. It 
seems an obvious candidate to be written in C++ with prototypes. I 
hate to think what the developers have to go through, instead.

In any case, writing a median filter is a bigger deal than taking a 
radial profile, and since one already existed I thought I'd ask.

>I doubt NaN would help you here; you'd still have to figure out what 
>to do in those places. Numeric did not have support for NaN because 
>there were portability problems. Probably still are. And you still 
>are stuck in a lot of cases anyway.

Well, NaN isn't very general in any case, since it's meaningless for 
integer data. So maybe that's a red herring. (Though if NaN had 
worked to mask data I would cheerfully have converted my images to 
floats to take advantage of it!).

What's really wanted is a more unified approach to masked data. I 
suppose it's pie in the sky, but I sure wish most the numarray 
functions took an optional mask array (or accepted a numarray.ma 
object -- nice for the user, but probably too painful for words under 
the hood).

I don't think there are major issues with what to do with masked 
data. Simply ignoring it works in most cases, e.g. mean, std dev, 
sum, max... In some cases one needs the new mask as output (e.g. 
matrix multiply). Filtering is a bit subtle: can masked data be 
treated the same as data off the edge? I hope so, but I'm not sure.

Anyway, I am grateful for what we do have. Without Numeric or 
numarray I would have to write all my image processing code in a 
different language.

-- Russell

More information about the Numpy-discussion mailing list