[Numpy-discussion] consensus (was: NA masks in the next numpy release?)

Ralf Gommers ralf.gommers@googlemail....
Sun Oct 30 14:24:32 CDT 2011


On Sat, Oct 29, 2011 at 11:55 PM, Matthew Brett <matthew.brett@gmail.com>wrote:

> Hi,
>
> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers
> <ralf.gommers@googlemail.com> wrote:
> >
> >
> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett <matthew.brett@gmail.com
> >
> > wrote:
> >>
> >> Hi,
> >>
> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett <matthew.brett@gmail.com
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers
> >> > <ralf.gommers@googlemail.com> wrote:
> >> >>
> >> >>
> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett
> >> >> <matthew.brett@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers
> >> >>> <ralf.gommers@googlemail.com> wrote:
> >> >>> >
> >> >>> >
> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett
> >> >>> > <matthew.brett@gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers
> >> >>> >> <ralf.gommers@googlemail.com> wrote:
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett
> >> >>> >> > <matthew.brett@gmail.com>
> >> >>> >> > wrote:
> >> >>> >> >>
> >> >>> >> >> Hi,
> >> >>> >> >>
> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris
> >> >>> >> >> <charlesr.harris@gmail.com> wrote:
> >> >>> >> >> >>
> >> >>> >> >>
> >> >>> >> >> No, that's not what Nathaniel and I are saying at all.
> Nathaniel
> >> >>> >> >> was
> >> >>> >> >> pointing to links for projects that care that everyone agrees
> >> >>> >> >> before
> >> >>> >> >> they go ahead.
> >> >>> >> >
> >> >>> >> > It looked to me like there was a serious intent to come to an
> >> >>> >> > agreement,
> >> >>> >> > or
> >> >>> >> > at least closer together. The discussion in the summer was
> going
> >> >>> >> > around
> >> >>> >> > in
> >> >>> >> > circles though, and was too abstract and complex to follow.
> >> >>> >> > Therefore
> >> >>> >> > Mark's
> >> >>> >> > choice of implementing something and then asking for feedback
> >> >>> >> > made
> >> >>> >> > sense
> >> >>> >> > to
> >> >>> >> > me.
> >> >>> >>
> >> >>> >> I should point out that the implementation hasn't - as far as I
> can
> >> >>> >> see - changed the discussion.  The discussion was about the API.
> >> >>> >>
> >> >>> >> Implementations are useful for agreed APIs because they can point
> >> >>> >> out
> >> >>> >> where the API does not make sense or cannot be implemented.  In
> >> >>> >> this
> >> >>> >> case, the API Mark said he was going to implement - he did
> >> >>> >> implement -
> >> >>> >> at least as far as I can see.  Again, I'm happy to be corrected.
> >> >>> >
> >> >>> > Implementations can also help the discussion along, by allowing
> >> >>> > people
> >> >>> > to
> >> >>> > try out some of the proposed changes. It also allows to construct
> >> >>> > examples
> >> >>> > that show weaknesses, possibly to be solved by an alternative API.
> >> >>> > Maybe
> >> >>> > you
> >> >>> > can hold the complete history of this topic in your head and
> >> >>> > comprehend
> >> >>> > it,
> >> >>> > but for me it would be very helpful if someone said:
> >> >>> > - here's my dataset
> >> >>> > - this is what I want to do with it
> >> >>> > - this is the best I can do with the current implementation
> >> >>> > - here's how API X would allow me to solve this better or simpler
> >> >>> > This can be done much better with actual data and an actual
> >> >>> > implementation
> >> >>> > than with a design proposal. You seem to disagree with this
> >> >>> > statement.
> >> >>> > That's fine. I would hope though that you recognize that concrete
> >> >>> > examples
> >> >>> > help people like me, and construct one or two to help us out.
> >> >>> That's what use-cases are for in designing APIs.  There are examples
> >> >>> of use in the NEP:
> >> >>>
> >> >>>
> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
> >> >>>
> >> >>> the alterNEP:
> >> >>>
> >> >>> https://gist.github.com/1056379
> >> >>>
> >> >>> and my longer email to Travis:
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored
> >> >>>
> >> >>> Mark has done a nice job of documentation:
> >> >>>
> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
> >> >>>
> >> >>> If you want to understand what the alterNEP case is, I'd suggest the
> >> >>> email, just because it's the most recent and I think the terminology
> >> >>> is slightly clearer.
> >> >>>
> >> >>> Doing the same examples on a larger array won't make the point
> easier
> >> >>> to understand.  The discussion is about what the right concepts are,
> >> >>> and you can help by looking at the snippets of code in those
> >> >>> documents, and deciding for yourself whether you think the current
> >> >>> masking / NA implementation seems natural and easy to explain, or
> >> >>> rather forced and difficult to explain, and then email back trying
> to
> >> >>> explain your impression (which is not always easy).
> >> >>
> >> >> If you seriously believe that looking at a few snippets is as helpful
> >> >> and
> >> >> instructive as being able to play around with them in IPython and
> >> >> modify
> >> >> them, then I guess we won't make progress in this part of the
> >> >> discussion.
> >> >> You're just telling me to go back and re-read things I'd already
> read.
> >> >
> >> > The snippets are in ipython or doctest format - aren't they?
> >>
> >> Oops - 10 minute rule.  Now I see that you mean that you can't
> >> experiment with the alternative implementation without working code.
> >
> > Indeed.
> >
> >>
> >> That's true, but I am hoping that the difference between - say:
> >>
> >> a[0:2] = np.NA
> >>
> >> and
> >>
> >> a.mask[0:2] = False
> >>
> >> would be easy enough to imagine.
> >
> > It is in this case. I agree the explicit ``a.mask`` is clearer. This is a
> > quite specific point that could be improved in the current
> implementation.
>
> Thanks - this is helpful.
>

So was your example.

>
> > It doesn't require ripping everything out.
>
> Nathaniel wasn't proposing 'ripping everything out' - but backing off
> until consensus has been reached.  That's different.


I'm worried that in practice it won't be different. If you put such a large
amount of code in a branch, with no one lined up to work on
changing/improving/re-integrating it, the most likely thing to happen is
that it will just sit there in a branch, bitrot and eventually be lost.


> If you think we should not do that, and you are interested, please say why.
> Second - I was proposing that we do indeed keep the code in the
> codebase but discuss adaptations that could achieve consensus.
>

Glad to hear it. This is not what I understood from the email you linked to
earlier. Quoting: "Honestly, I think that NA should be a synonym for
ABSENT, and so should be removed until the dust has settled, and restored
as (np.NA == np.ABSENT)".

At this point I care much more about having a good implementation than
exactly which one; the similarities are much more important the
differences. My main worry is we end up with nothing.

As for the current situation and way forward, Eric Firing provided a much
better summary and list of important points than I managed to communicate
so far. I agree with everything he said.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111030/df2981dc/attachment-0001.html 


More information about the NumPy-Discussion mailing list