[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Christopher Barker Chris.Barker@noaa....
Fri Jun 24 11:13:51 CDT 2011

Nathaniel Smith wrote:
>> The 'dtype factory' idea builds on the way I've structured datetime as a
>> parameterized type,


Another disadvantage is that we get further from Gael Varoquaux's point: 
 >> Right now, the numpy array can be seen as an extension of the C
>> array, basically a pointer, a data type, and a shape (and strides).
>>  This enables easy sharing with libraries that have not been
>> written with numpy in mind.

and also PEP 3118 support

It is very useful that a numpy array has a pointer to a regular old C 
array -- if we introduce this special dtype, that will break (well, not 
really, put the the c array would be of this particular struct). 
Granted, any other C code would properly have to do something with the 
mask anyway, but I still think it'd be better to keep that raw data 
array standard.

This applies to switching between masked and not-masked numpy arrays 
also -- I don't think I'd want the performance hot of that requiring a 
data copy.

Also the idea was posted here that you could use views to have the same 
data set with different masks -- that would break as well.

Nathaniel Smith wrote:

> If we think that the memory overhead for floating point types is too
> high, it would be easy to add a special case where maybe(float) used a
> distinguished NaN instead of a separate boolean.

That would  be pretty cool, though in the past folks have made a good 
argument that even for floats, masks have significant advantages over 
"just using NaN". One might be that you can mask and unmask a value for 
different operations, without losing the value.


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception


More information about the NumPy-Discussion mailing list