[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Matthew Brett matthew.brett@gmail....
Fri Jun 24 19:02:07 CDT 2011


Hi,

On Sat, Jun 25, 2011 at 12:22 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
...
> Perhaps we should make a wiki page someplace summarizing pros and cons
> of the various implementation approaches?

But - we should do this if it really is an open question which one we
go for.   If not then, we're just slowing Mark down in getting to the
implementation.

Assuming the question is still open, here's a starter for the pros and cons:

array.mask
1) It's easier / neater to implement
2) It can generalize across dtypes
3) You can still get the masked data underneath the mask (allowing you
to unmask etc)

nafloat64:
1) No memory overhead
2) Battle-tested implementation already done in R

I guess we'd have to test directly whether the non-continuous memory
of the mask and data would cause enough cache-miss problems to
outweigh the potential cycle-savings from single byte comparisons in
array.mask.

I guess that one and only one of these will get written.  I guess that
one of these choices may be a lot more satisfying to the current and
future masked array itch than the other.

I'm personally worried that the memory overhead of array.masks will
make many of us tend to avoid them.  I work with images that can
easily get large enough that I would not want an array-items size byte
array added to my storage.

The reason I'm asking for more details about the implementation is
because that is most of the argument for array.mask at the moment (1
and 2 above).

See you,

Matthew


More information about the NumPy-Discussion mailing list