[Numpy-discussion] Missing/accumulating data

Charles R Harris charlesr.harris@gmail....
Tue Jun 28 12:34:38 CDT 2011

On Tue, Jun 28, 2011 at 10:50 AM, Joe Harrington <jh@physics.ucf.edu> wrote:

> As with Travis, I have not had time to wade through the 150+ messages
> on masked arrays, but I'd like to raise a concept I've mentioned in
> the past that would enable a broader use if done slightly differently.
> That is, the "masked array" problem is a subset of this more-general
> problem.  Please respond on this thread if you want me to see the
> replies.
> The problem is of data accumulation and averaging, and it comes up all
> the time in astronomy, medical imaging, and other fields that handle
> stacks of images.  Say that I observe a planet, taking images at, say,
> 30 Hz for several hours, as with many video cameras.  I want to
> register or map those images to a common coordinate system, stack
> them, and average them.  I must recognize:
> 1. transient there are bad pixels due to cosmic ray hits on the CCD array
> 2. some pixels are permanently dead and always produce bad data
> 3. some data in the image is not part of the planet
> 4. some images have too much atmospheric blurring to be used
> 5. sometimes moons of planets like Jupiter pass in front
> 6. etc.
> For each original image:
>  Make a mask where 1 is good and 0 is bad.  This trims both bad
>      pixels and pixels that are not part of the planet.
>  Evaluate the images and zero the entire image (or regions of an
>      image) if quality is low (or there is an obstruction).
>  Shift (or map, in the case of lon-lat mapping) the image AND THE
>      MASK to a common coordinate grid.
>  Append the image and the mask to the ends of the stacks of prior
>      images and masks.
> For each x and y position in the image/mask stacks:
>  Perform some averaging of all pixels in the stack that are not masked,
>      or otherwise use the mask info to decide how to use the image info.
> The most common of these averages is to multiply the stack of images
> by the stack of masks, zeroing the bad data, then sum both images and
> masks down the third axis.  This produces an accumulator image and a
> count image, where the count image says how many data points went into
> the accumulator at each pixel position.  Divide the accumulator by the
> count.  This produces an image with no missing data and the highest
> SNR per pixel possible.
> Two things differ from the current ma package: the sense of TRUE/FALSE
> and the ability to have numbers other than 0 or 1.  For example, if
> data quality were part of the assessment, each pixel could get a data
> quality number and the accumulation would only include pixels with
> better than a certain number, or there could be weighted averaging.
> Currently, we just implement this as described, without using the
> masked array capability at all.  However, it's cumbersome to hand
> around pairs of objects, and making a new object to handle them
> together means breaking it apart when calling standard routines that
> don't handle masks.  There are probably some very fast ways to do
> interesting kinds of averages that we haven't explored but would if
> this kind of capability were in the infrastructure.
> So, I hope that if some kind of masking goes into the array object, it
> allows TRUE=good, FALSE=bad, and at least an integer count if not a
> floating quality number.  One bit is not enough.  Of course, these
> could be options.  Certainly adding 7 or 31 more bits to the mask will
> make space problems for some, as would flipping the sense of the
> mask.  However, speed matters.  Some of those doing this kind of
> processing are handling hours of staring video data, which is 1e6
> frames a night from each camera.
This might be described as masks as weights, or perhaps from the Bayesian
point of view, as measures of just how unknown unknown data is. One thing I
like about masks is that they allow ideas like this to creep in and thus
encourage future innovations. But as you also point out, 8 bits might be a
bit on the low side for that and a consistent idea of how to interpret
things would help ala the missing or unknown models.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110628/381c0028/attachment.html 

More information about the NumPy-Discussion mailing list