[Numpy-discussion] missing data discussion round 2
Tue Jun 28 18:00:43 CDT 2011
On Wed, Jun 29, 2011 at 1:40 AM, Jason Grout <firstname.lastname@example.org>wrote:
> On 6/28/11 5:20 PM, Matthew Brett wrote:
> > Hi,
> > On Tue, Jun 28, 2011 at 4:06 PM, Nathaniel Smith<email@example.com> wrote:
> > ...
> >> (You might think, what difference does it make if you *can* unmask an
> >> item? Us missing data folks could just ignore this feature. But:
> >> whatever we end up implementing is something that I will have to
> >> explain over and over to different people, most of them not
> >> particularly sophisticated programmers. And there's just no sensible
> >> way to explain this idea that if you store some particular value, then
> >> it replaces the old value, but if you store NA, then the old value is
> >> still there.
> > Ouch - yes. No question, that is difficult to explain. Well, I
> > think the explanation might go like this:
> > "Ah, yes, well, that's because in fact numpy records missing values by
> > using a 'mask'. So when you say `a = np.NA', what you mean is,
> > 'a._mask = np.ones(a.shape, np.dtype(bool); a._mask = False`"
> > Is that fair?
> Maybe instead of np.NA, we could say np.IGNORE, which sort of conveys
> the idea that the entry is still there, but we're just ignoring it. Of
> course, that goes against common convention, but it might be easier to
Somehow very similar approach how I always have treated the NaNs.
(Thus postponing all the real (slightly dirty) work on to the imputation
For me it has been sufficient to ignore what's the actual cause of NaNs. But
I believe there exists plenty other much more sophisticated situations where
this kind of simple treatment is not sufficient, at all. Anyway, even in
the future it should still be possible to play nicely with these kind of
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion