[Numpy-discussion] masked arrays and nd_image
Rick White
rlw at stsci.edu
Wed Aug 11 12:43:01 CDT 2004
On Wed, 11 Aug 2004, Stephen Walton wrote:
> I hope I'm not jumping in where I don't belong, but here at SFO/CSUN
> we've had quite a bit of experience with convolutions and correlations
> of time series (serieses?) with missing data points.
>
> I'm sure you all know this, but: For the scaling to be correct, you
> have to not only mask out the values you don't want, but normalize the
> sum to reflect the fact that different numbers of values will appear in
> the sum. Our MATLAB code to convolve x and y, with bad points marked by
> NaNs, is:
>
> for i = 1 : xlen-ylen+1
> j(i)=i;
> x1=x(i:i+ylen-1);
> a=x1.*y;
> b=a(find(~isnan(a)));
> if isempty(b)
> z(i)= NaN;
> else
> z(i)=sum(b)/length(b)
> end
> end
>
> I'd be happy to know how to code up the equivalent in numarray. In the
> above, note that x1 is x padded on both ends with ylen-1 NaNs.
If you have a mask array with ones marking the good points and zeros the
points to ignore, here is a simpler version that does pretty much the same
calculation:
import numarray as na
kernel = na.ones(ylen)
z = nd.convolve1d(x*mask,kernel,mode='constant') / \
nd.convolve1d(mask,kernel,mode='constant')
This would need some elaboration to work with your data -- e.g., NaN*0 is
not zero so you don't want the x array to have NaN values in it. And it
may need an extra test to handle the case where a region larger than ylen
is masked out (this version would put in NaN values, which might be OK).
But the basic idea is the same.
I imagine this is possible in Matlab (which I don't use) -- I use it all
the time in IDL to do convolutions in the presence of missing data.
> Unfortunately, and again I'm sure everyone knows this, you can't use
> FFTs to speed up convolutions/correlations if you have missing data
> points, so you have to use discrete techniques. The numerical analysis
> literature refers to this problem as Fourier analysis of unequally
> spaced data. The only publications and algorithms I could find went the
> wrong way: given an unequally spaced set of points in Fourier space,
> find the most likely reconstruction in real space.
This can in fact be done using FFTs. You have to be careful about how the
boundary conditions are handled, adding padding cells to avoid wraparound
(that's what the mode='constant' is about). The approach is very similar
though. This works in 2-D as well and is vastly faster than doing a
direct convolution.
Incidentally, if you have truly unequally spaced data points (as opposed
to missing points out of an equally spaced series), this particular trick
doesn't work but there is another approach to doing DFTs in FFT time.
It's a bit lengthy to explain (and off the path for this group), but radio
astronomers figured out how to do it a long time ago. It is probably the
algorithm you mention using unequally-spaced points in Fourier space; the
same algorithm works fine if the points are in real space since there is
hardly any difference between a forward and reverse Fourier transform.
Rick
More information about the Numpy-discussion
mailing list