[Numpy-discussion] Missing/accumulating data

Mark Wiebe mwwiebe@gmail....
Wed Jun 29 12:48:33 CDT 2011


Yeah, it takes a long time to wade through and respond to everything.

I think the "missing data" problem and weighted masking are closely related,
but neither one is fully a subset of the other. With a non-boolean alpha
mask, there's an implication of a multiplication operator in there
somewhere, but with a boolean mask, the data can be any data whatsoever that
doesn't necessarily support any kind of blending operations.

For the image accumulation you're describing, I would use either a
structured array with 'color' and 'weight' fields, or have the last element
of the color channel be the weight (like an RGBA image) so adding multiple
weighted images together would add both the colors and the weights
simultaneously, without requiring a ufunc extension supporting struct
dtypes.

Cheers,
Mark

On Tue, Jun 28, 2011 at 11:50 AM, Joe Harrington <jh@physics.ucf.edu> wrote:

> As with Travis, I have not had time to wade through the 150+ messages
> on masked arrays, but I'd like to raise a concept I've mentioned in
> the past that would enable a broader use if done slightly differently.
> That is, the "masked array" problem is a subset of this more-general
> problem.  Please respond on this thread if you want me to see the
> replies.
>
> The problem is of data accumulation and averaging, and it comes up all
> the time in astronomy, medical imaging, and other fields that handle
> stacks of images.  Say that I observe a planet, taking images at, say,
> 30 Hz for several hours, as with many video cameras.  I want to
> register or map those images to a common coordinate system, stack
> them, and average them.  I must recognize:
>
> 1. transient there are bad pixels due to cosmic ray hits on the CCD array
> 2. some pixels are permanently dead and always produce bad data
> 3. some data in the image is not part of the planet
> 4. some images have too much atmospheric blurring to be used
> 5. sometimes moons of planets like Jupiter pass in front
> 6. etc.
>
> For each original image:
>  Make a mask where 1 is good and 0 is bad.  This trims both bad
>      pixels and pixels that are not part of the planet.
>  Evaluate the images and zero the entire image (or regions of an
>      image) if quality is low (or there is an obstruction).
>  Shift (or map, in the case of lon-lat mapping) the image AND THE
>      MASK to a common coordinate grid.
>  Append the image and the mask to the ends of the stacks of prior
>      images and masks.
>
> For each x and y position in the image/mask stacks:
>  Perform some averaging of all pixels in the stack that are not masked,
>      or otherwise use the mask info to decide how to use the image info.
>
> The most common of these averages is to multiply the stack of images
> by the stack of masks, zeroing the bad data, then sum both images and
> masks down the third axis.  This produces an accumulator image and a
> count image, where the count image says how many data points went into
> the accumulator at each pixel position.  Divide the accumulator by the
> count.  This produces an image with no missing data and the highest
> SNR per pixel possible.
>
> Two things differ from the current ma package: the sense of TRUE/FALSE
> and the ability to have numbers other than 0 or 1.  For example, if
> data quality were part of the assessment, each pixel could get a data
> quality number and the accumulation would only include pixels with
> better than a certain number, or there could be weighted averaging.
>
> Currently, we just implement this as described, without using the
> masked array capability at all.  However, it's cumbersome to hand
> around pairs of objects, and making a new object to handle them
> together means breaking it apart when calling standard routines that
> don't handle masks.  There are probably some very fast ways to do
> interesting kinds of averages that we haven't explored but would if
> this kind of capability were in the infrastructure.
>
> So, I hope that if some kind of masking goes into the array object, it
> allows TRUE=good, FALSE=bad, and at least an integer count if not a
> floating quality number.  One bit is not enough.  Of course, these
> could be options.  Certainly adding 7 or 31 more bits to the mask will
> make space problems for some, as would flipping the sense of the
> mask.  However, speed matters.  Some of those doing this kind of
> processing are handling hours of staring video data, which is 1e6
> frames a night from each camera.
>
> --jh--
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110629/654b11a0/attachment-0001.html 


More information about the NumPy-Discussion mailing list