[Numpy-discussion] alterNEP - was: missing data discussion round 2

Pierre GM pgmdevlist@gmail....
Thu Jun 30 08:58:06 CDT 2011


On Jun 30, 2011, at 3:31 PM, Matthew Brett wrote:
> ###############################################
> A alternative-NEP on masking and missing values
> ###############################################

I like the idea of two different special values, np.NA for missing values, np.IGNORE for masked values. np.NA values in an array define what was implemented in numpy.ma as a 'hard mask' (where you can't unmask data), while np.IGNOREs correspond to the .mask in numpy.ma. Looks fairly non ambiguous that way.


> **************
> Initialization
> **************
> 
> First, missing values can be set and be displayed as ``np.NA, NA``::
> 
>>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
>    array([1., 2., NA, 7.], dtype='NA[<f8]')
> 
> As the initialization is not ambiguous, this can be written without the NA
> dtype::
> 
>>>> np.array([1.0, 2.0, np.NA, 7.0])
>    array([1., 2., NA, 7.], dtype='NA[<f8]')
> 
> Masked values can be set and be displayed as ``np.MASKED, MASKED``::
> 
>>>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
>    array([1., 2., MASKED, 7.], masked=True)
> 
> As the initialization is not ambiguous, this can be written without
> ``masked=True``::
> 
>>>> np.array([1.0, 2.0, np.MASKED, 7.0])
>    array([1., 2., MASKED, 7.], masked=True)

I'm not happy with this 'masked' parameter, at all. What's the point? Either you have np.NAs and/or np.IGNOREs or you don't. I'm probably missing something here.


> ******
> Ufuncs
> ******

All fine.
> 
> **********
> Assignment
> **********
> 
> is obvious in the NA case::
> 
>>>> arr = np.array([1.0, 2.0, 7.0])
>>>> arr[2] = np.NA
>    TypeError('dtype does not support NA')
>>>> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]')
>>>> na_arr[2] = np.NA
>>>> na_arr
>    array([1., 2., NA], dtype='NA[<f8]')

OK


> 
> Direct assignnent in the masked case is magic and confusing, and so happens only
> via the mask::
> 
>>>> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
>>>> masked_arr[2] = np.NA
>    TypeError('dtype does not support NA')
>>>> masked_arr[2] = np.MASKED
>    TypeError('float() argument must be a string or a number')
>>>> masked_arr.visible[2] = False
>>>> masked_arr
>    array([1., 2., MASKED], masked=True)

What about the reverse case ? When you assign a regular value to a np.NA/np.IGNORE item ?


More information about the NumPy-Discussion mailing list