[Numpy-discussion] missing data discussion round 2

Matthew Brett matthew.brett@gmail....
Wed Jun 29 15:25:24 CDT 2011


Hi,

On Wed, Jun 29, 2011 at 9:17 PM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
>
>
> On Wed, Jun 29, 2011 at 1:32 PM, Matthew Brett <matthew.brett@gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Wed, Jun 29, 2011 at 6:22 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
>> > On Wed, Jun 29, 2011 at 8:20 AM, Lluís <xscript@gmx.net> wrote:
>> >>
>> >> Matthew Brett writes:
>> >>
>> >> >> Maybe instead of np.NA, we could say np.IGNORE, which sort of
>> >> >> conveys
>> >> >> the idea that the entry is still there, but we're just ignoring it.
>> >> >>  Of
>> >> >> course, that goes against common convention, but it might be easier
>> >> >> to
>> >> >> explain.
>> >>
>> >> > I think Nathaniel's point is that np.IGNORE is a different idea than
>> >> > np.NA, and that is why joining the implementations can lead to
>> >> > conceptual confusion.
>> >>
>> >> This is how I see it:
>> >>
>> >> >>> a = np.array([0, 1, 2], dtype=int)
>> >> >>> a[0] = np.NA
>> >> ValueError
>> >> >>> e = np.array([np.NA, 1, 2], dtype=int)
>> >> ValueError
>> >> >>> b  = np.array([np.NA, 1, 2], dtype=np.maybe(int))
>> >> >>> m  = np.array([np.NA, 1, 2], dtype=int, masked=True)
>> >> >>> bm = np.array([np.NA, 1, 2], dtype=np.maybe(int), masked=True)
>> >> >>> b[1] = np.NA
>> >> >>> np.sum(b)
>> >> np.NA
>> >> >>> np.sum(b, skipna=True)
>> >> 2
>> >> >>> b.mask
>> >> None
>> >> >>> m[1] = np.NA
>> >> >>> np.sum(m)
>> >> 2
>> >> >>> np.sum(m, skipna=True)
>> >> 2
>> >> >>> m.mask
>> >> [False, False, True]
>> >> >>> bm[1] = np.NA
>> >> >>> np.sum(bm)
>> >> 2
>> >> >>> np.sum(bm, skipna=True)
>> >> 2
>> >> >>> bm.mask
>> >> [False, False, True]
>> >>
>> >> So:
>> >>
>> >> * Mask takes precedence over bit pattern on element assignment. There's
>> >>  still the question of how to assign a bit pattern NA when the mask is
>> >>  active.
>> >>
>> >> * When using mask, elements are automagically skipped.
>> >>
>> >> * "m[1] = np.NA" is equivalent to "m.mask[1] = False"
>> >>
>> >> * When using bit pattern + mask, it might make sense to have the
>> >> initial
>> >>  values as bit-pattern NAs, instead of masked (i.e., "bm.mask == [True,
>> >>  False, True]" and "np.sum(bm) == np.NA")
>> >
>> > There seems to be a general idea that masks and NA bit patterns imply
>> > particular differing semantics, something which I think is simply false.
>>
>> Well - first - it's helpful surely to separate the concepts and the
>> implementation.
>>
>> Concepts / use patterns (as delineated by Nathaniel):
>> A) missing values == 'np.NA' in my emails.  Can we call that CMV
>> (concept missing values)?
>> B) masks == np.IGNORE in my emails . CMSK (concept masks)?
>>
>> Implementations
>> 1) bit-pattern == na-dtype - how about we call that IBP
>> (implementation bit patten)?
>> 2) array.mask.  IM (implementation mask)?
>>
>
> Remember that the masks are invisible, you can't see them, they are an
> implementation detail. A good reason to hide the implementation is so it can
> be changed without impacting software that depends on the API.

It's not true that you can't see them because masks are using the same
API as for missing values.  Because they're using the same API, the
person using the CMV stuff will soon find out about the masks,
accidentally or not, then they will need to understand masking, and
that is the problem we're discussing here.

See you,

Matthew


More information about the NumPy-Discussion mailing list