[Numpy-discussion] missing data discussion round 2

Mark Wiebe mwwiebe@gmail....
Tue Jun 28 18:37:17 CDT 2011


On Tue, Jun 28, 2011 at 3:45 PM, Pierre GM <pgmdevlist@gmail.com> wrote:

> All,
> I'm not sure I understand some aspects of Mark's new proposal, sorry (blame
> the lack of sleep).
> I'm pretty excited with the idea of built-in NA like
> np.dtype(NA['float64']), provided we can come with some shortcuts like
> np.nafloat64.


This could be created at NumPy startup time, no problem.


> I think that would really take care of the missing data part in a
> consistent and non-ambiguous way.
> However, I understand that if a choice would be made, this approach would
> be dropped for the most generic "mask way", right ? (By "mask way", I mean
> something that is close (but actually optimized) to thenumpy.ma approach).
>

The NEP proposes strict NA missing value semantics, where the only way to
get at the masked values is by having another view that doesn't have the
value masked. If someone has use cases where this prevents some
functionality they need, I'd love to hear them.

So, taking this example
> >>> np.add(a, b, out=b, mask=(a > threshold))
> If 'b' doesn't already have a mask, masked values will be lost if we go the
> mask way ? But kept if we go the bit way ? I prefer the latter, then
> Another advantage I see in the "bit-way' is that it's pretty close to the
> 'hardmask' idea. You'll never risk to lose the mask as it's already "burned"
> in the array...
>

I've nearly finished this parameter, and decided to call it 'where' instead,
because it is operating like an SQL where clause. Here if neither a nor b
are masked array it will only modify those values of b where the 'where'
parameter has the value True.

And now for something not that completely different:
> * Would it be possible to store internally the addresses of the NAs only to
> save some space (in the metadata ?) and when the .mask or .valid property is
> called, to still get a boolean array with the same shape as the underlying
> array ?
>

Something like this could be possible, but would certainly complicate the
implementation. If it were desired, it would be a follow-up feature.

-Mark


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110628/9e4a8088/attachment.html 


More information about the NumPy-Discussion mailing list