[Numpy-discussion] Missing data again

Nathaniel Smith njs@pobox....
Wed Mar 7 17:21:33 CST 2012


On Wed, Mar 7, 2012 at 7:39 PM, Benjamin Root <ben.root@ou.edu> wrote:
> On Wed, Mar 7, 2012 at 1:26 PM, Nathaniel Smith <njs@pobox.com> wrote:
>> When it comes to "missing data", bitpatterns can do everything that
>> masks can do, are no more complicated to implement, and have better
>> performance characteristics.
>>
>
> Not true.  bitpatterns inherently destroys the data, while masks do not.

Yes, that's why I only wrote that this is true for "missing data", not
in general :-). If you have data that is being destroyed, then that's
not missing data, by definition. We don't have consensus yet on
whether that's the use case we are aiming for, but it's the one that
Pierre was worrying about.

> For matplotlib, we can not use bitpatterns because it could over-write user
> data (or we have to copy the data).  I would imagine other extension writers
> would have similar issues when they need to play around with input data in a
> safe manner.

Right. You clearly need some sort of masking, either an explicit mask
array that you keep somewhere, or one that gets attached to the
underlying ndarray in some non-destructive way.

> Also, I doubt that the performance characteristics for strings and integers
> are the same as it is for masks.

Not sure what you mean by this, but I'd be happy to hear more.

-- Nathaniel


More information about the NumPy-Discussion mailing list