[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Fri Jun 24 13:06:44 CDT 2011
On Fri, Jun 24, 2011 at 11:13 AM, Christopher Barker
> Nathaniel Smith wrote:
> >> The 'dtype factory' idea builds on the way I've structured datetime as a
> >> parameterized type,
> Another disadvantage is that we get further from Gael Varoquaux's point:
> >> Right now, the numpy array can be seen as an extension of the C
> >> array, basically a pointer, a data type, and a shape (and strides).
> >> This enables easy sharing with libraries that have not been
> >> written with numpy in mind.
> and also PEP 3118 support
> It is very useful that a numpy array has a pointer to a regular old C
> array -- if we introduce this special dtype, that will break (well, not
> really, put the the c array would be of this particular struct).
> Granted, any other C code would properly have to do something with the
> mask anyway, but I still think it'd be better to keep that raw data
> array standard.
It's not actually a pointer to a C array, there is already a lot of checking
and possibly a copy/buffer required before you can treat it as such. The
data may be misaligned, have noncontiguous strides, have a non-C
multidimensional memory layout, or have a different byte order. Dealing with
all these special cases in a uniform way is one of the things the 1.6 nditer
provides a lot of helps for.
> This applies to switching between masked and not-masked numpy arrays
> also -- I don't think I'd want the performance hot of that requiring a
> data copy.
When performance is important, it is still possible to avoid that copy - by
adding the mask to a view of the original array. The mask= parameter to
ufuncs, something which is independent of arrays with masks, also provides a
way to do masked operations without ever touching masked arrays.
Also the idea was posted here that you could use views to have the same
> data set with different masks -- that would break as well.
I'm not sure how this would break? I think that should work just fine.
> Nathaniel Smith wrote:
> > If we think that the memory overhead for floating point types is too
> > high, it would be easy to add a special case where maybe(float) used a
> > distinguished NaN instead of a separate boolean.
> That would be pretty cool, though in the past folks have made a good
> argument that even for floats, masks have significant advantages over
> "just using NaN". One might be that you can mask and unmask a value for
> different operations, without losing the value.
Especially with the ability to do the "hardmask" feature, this aspect of it
might end up being useful.
> Christopher Barker, Ph.D.
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE (206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion