[Numpy-discussion] NA masks for NumPy are ready to test

Bruce Southey bsouthey@gmail....
Fri Aug 19 18:52:53 CDT 2011


On Fri, Aug 19, 2011 at 3:05 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
> On Fri, Aug 19, 2011 at 11:44 AM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
>>
>>
>> On Fri, Aug 19, 2011 at 12:37 PM, Bruce Southey <bsouthey@gmail.com>
>> wrote:
>>>
>>> Hi,
>>> Just some immediate minor observations that are really about trying to
>>> be consistent:
>>>
>>> 1) Could you keep the display of the NA dtype be the same as the array?
>>> For example, NA dtype is displayed as '<f8' but should be displayed as
>>> 'float64' as that is the array dtype.
>>>  >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]])
>>> >>> a
>>> array([[  1.,   2.,   3., NA],
>>>       [  3.,   4.,  nan,   5.]])
>>> >>> a.dtype
>>> dtype('float64')
>>> >>> a.sum()
>>> NA(dtype='<f8')
>>>
>>> 2) Can the 'skipna' flag be added to the methods?
>>> >>> a.sum(skipna=True)
>>> Traceback (most recent call last):
>>>  File "<stdin>", line 1, in <module>
>>> TypeError: 'skipna' is an invalid keyword argument for this function
>>> >>> np.sum(a,skipna=True)
>>> nan
>>>
>>> 3) Can the skipna flag be extended to exclude other non-finite cases like
>>> NaN?
>>>
>>> 4) Assigning a np.NA needs a better error message but the Integer
>>> array case is more informative:
>>> >>> b=np.array([1,2,3,4], dtype=np.float128)
>>> >>> b[0]=np.NA
>>> Traceback (most recent call last):
>>>  File "<stdin>", line 1, in <module>
>>> TypeError: float() argument must be a string or a number
>>>
>>> >>> j=np.array([1,2,3])
>>> >>> j
>>> array([1, 2, 3])
>>> >>> j[0]=ina
>>> Traceback (most recent call last):
>>>  File "<stdin>", line 1, in <module>
>>> TypeError: int() argument must be a string or a number, not
>>> 'numpy.NAType'
>>>
>>> But it is nice that np.NA 'adjusts' to the insertion array:
>>> >>> b.flags.maskna = True
>>> >>> ana
>>> NA(dtype='<f8')
>>> >>> b[0]=ana
>>> >>> b[0]
>>> NA(dtype='<f16')
>>>
>>> 5) Different display depending on masked state. That is I think that
>>> 'maskna=True' should be displayed always when flags.maskna is True :
>>> >>> j=np.array([1,2,3], dtype=np.int8)
>>> >>> j
>>> array([1, 2, 3], dtype=int8)
>>> >>> j.flags.maskna=True
>>> >>> j
>>> array([1, 2, 3], maskna=True, dtype=int8)
>>> >>> j[0]=np.NA
>>> >>> j
>>> array([NA, 2, 3], dtype=int8) # Ithink it should still display
>>> 'maskna=True'.
>>>
>>
>> My main peeve is that NA is upper case ;) I suppose that could use some
>> discussion.
>
> There is some proliferation of cases in the NaN case:
>>>> np.nan
> nan
>>>> np.NAN
> nan
>>>> np.NaN
> nan
> The pros I see for NA over na are:
> * less confusion of NA vs nan (should this carry over to the np.isna
> function, should it be np.isNA according to this point?)
> * more comfortable for switching between NumPy and R when people have to use
> both at the same time
> The main con is:
> * Inconsistent with current nan, inf printing. Here's a hackish workaround:
>>>> np.na = np.NA
>>>> np.set_printoptions(nastr='na')
>>>> np.array([np.na, 2.0])
> array([na,  2.])
> What's your list of pros and cons?
> -Mark
>
>>
>> Chuck
>>

In part I sort of like to have NA and nan since poor
eyesight/typing/editing avoiding problems dropping the last 'n'.

Regarding nan/NAN, do you mean something like my ticket 1051?
http://projects.scipy.org/numpy/ticket/1051
I do not care that much about the case (mixed case is not good)
provided that there is only one to specify these.

Also should np.isfinite() return False for np.NA?
>>> np.isfinite([1,2,np.NA,4])
array([ True,  True, NA,  True], dtype=bool)

Anyhow, many thanks for the replies to my observations and your
amazing effect in getting this done.

Bruce


More information about the NumPy-Discussion mailing list