[Numpy-discussion] missing data discussion round 2

Mark Wiebe mwwiebe@gmail....
Tue Jun 28 19:47:51 CDT 2011


On Tue, Jun 28, 2011 at 6:56 PM, Pierre GM <pgmdevlist@gmail.com> wrote:

>
> On Jun 29, 2011, at 1:37 AM, Mark Wiebe wrote:
>
> > On Tue, Jun 28, 2011 at 3:45 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
> > ...
> >
> > I think that would really take care of the missing data part in a
> consistent and non-ambiguous way.
> > However, I understand that if a choice would be made, this approach would
> be dropped for the most generic "mask way", right ? (By "mask way", I mean
> something that is close (but actually optimized) to thenumpy.ma approach).
> >
> > The NEP proposes strict NA missing value semantics, where the only way to
> get at the masked values is by having another view that doesn't have the
> value masked. If someone has use cases where this prevents some
> functionality they need, I'd love to hear them.
>
> Mmh... Would you have an example ? I haven't caught up with my lack of
> sleep yet...


Sure, I'll copy one I made for the NEP for starters:

>>> a = np.array([1,2])
>>> b = a.view()
>>> b.flags.hasmask = True
>>> b
array([1,2], masked=True)
>>> b[0] = np.NA
>>> b
array([NA,2], masked=True)
>>> a
array([1,2])
>>> # The underlying number 1 value in 'a[0]' was untouched

At this point, there is no way to access the number 1 value using 'b', just
using 'a'. If we assign to 'b[0]' it will also change a[0]:

>>> b[0] = 3
>>> a
array([3,2])


> >
> > So, taking this example
> > >>> np.add(a, b, out=b, mask=(a > threshold))
> > If 'b' doesn't already have a mask, masked values will be lost if we go
> the mask way ? But kept if we go the bit way ? I prefer the latter, then
> > Another advantage I see in the "bit-way' is that it's pretty close to the
> 'hardmask' idea. You'll never risk to lose the mask as it's already "burned"
> in the array...
> >
> > I've nearly finished this parameter, and decided to call it 'where'
> instead, because it is operating like an SQL where clause. Here if neither a
> nor b are masked array it will only modify those values of b where the
> 'where' parameter has the value True.
>
> OK, sounds fine. Pretty fine, actually. Just to be clear, if 'out' is not
> defined, the result is a masked array with 'where' as mask. What's the value
> below the mask ? np.NA ?
>

The value below the mask is like the result of np.empty(), and with strict
missing value semantics, it shouldn't be possible to ever get at it. (except
for breaking the rules from C code).

> And now for something not that completely different:
> > * Would it be possible to store internally the addresses of the NAs only
> to save some space (in the metadata ?) and when the .mask or .valid property
> is called, to still get a boolean array with the same shape as the
> underlying array ?
> >
> > Something like this could be possible, but would certainly complicate the
> implementation. If it were desired, it would be a follow-up feature.
>
> Oh, no problem. I was suggesting a way to save some space, but if it's too
> tricky to implement, forget it.
>

Cool, I think the semantics with views of masks might not work either. With
a bit-level mask the views would still be possible but more complicated than
normal views.

-Mark


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110628/90d7e046/attachment.html 


More information about the NumPy-Discussion mailing list