[Numpy-discussion] consensus (was: NA masks in the next numpy release?)

Matthew Brett matthew.brett@gmail....
Sun Oct 30 14:32:53 CDT 2011


Hi,

On Sun, Oct 30, 2011 at 12:24 PM, Ralf Gommers
<ralf.gommers@googlemail.com> wrote:
>
>
> On Sat, Oct 29, 2011 at 11:55 PM, Matthew Brett <matthew.brett@gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers
>> <ralf.gommers@googlemail.com> wrote:
>> >
>> >
>> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett
>> > <matthew.brett@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett
>> >> <matthew.brett@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers
>> >> > <ralf.gommers@googlemail.com> wrote:
>> >> >>
>> >> >>
>> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett
>> >> >> <matthew.brett@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi,
>> >> >>>
>> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers
>> >> >>> <ralf.gommers@googlemail.com> wrote:
>> >> >>> >
>> >> >>> >
>> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett
>> >> >>> > <matthew.brett@gmail.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> Hi,
>> >> >>> >>
>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers
>> >> >>> >> <ralf.gommers@googlemail.com> wrote:
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett
>> >> >>> >> > <matthew.brett@gmail.com>
>> >> >>> >> > wrote:
>> >> >>> >> >>
>> >> >>> >> >> Hi,
>> >> >>> >> >>
>> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris
>> >> >>> >> >> <charlesr.harris@gmail.com> wrote:
>> >> >>> >> >> >>
>> >> >>> >> >>
>> >> >>> >> >> No, that's not what Nathaniel and I are saying at all.
>> >> >>> >> >> Nathaniel
>> >> >>> >> >> was
>> >> >>> >> >> pointing to links for projects that care that everyone agrees
>> >> >>> >> >> before
>> >> >>> >> >> they go ahead.
>> >> >>> >> >
>> >> >>> >> > It looked to me like there was a serious intent to come to an
>> >> >>> >> > agreement,
>> >> >>> >> > or
>> >> >>> >> > at least closer together. The discussion in the summer was
>> >> >>> >> > going
>> >> >>> >> > around
>> >> >>> >> > in
>> >> >>> >> > circles though, and was too abstract and complex to follow.
>> >> >>> >> > Therefore
>> >> >>> >> > Mark's
>> >> >>> >> > choice of implementing something and then asking for feedback
>> >> >>> >> > made
>> >> >>> >> > sense
>> >> >>> >> > to
>> >> >>> >> > me.
>> >> >>> >>
>> >> >>> >> I should point out that the implementation hasn't - as far as I
>> >> >>> >> can
>> >> >>> >> see - changed the discussion.  The discussion was about the API.
>> >> >>> >>
>> >> >>> >> Implementations are useful for agreed APIs because they can
>> >> >>> >> point
>> >> >>> >> out
>> >> >>> >> where the API does not make sense or cannot be implemented.  In
>> >> >>> >> this
>> >> >>> >> case, the API Mark said he was going to implement - he did
>> >> >>> >> implement -
>> >> >>> >> at least as far as I can see.  Again, I'm happy to be corrected.
>> >> >>> >
>> >> >>> > Implementations can also help the discussion along, by allowing
>> >> >>> > people
>> >> >>> > to
>> >> >>> > try out some of the proposed changes. It also allows to construct
>> >> >>> > examples
>> >> >>> > that show weaknesses, possibly to be solved by an alternative
>> >> >>> > API.
>> >> >>> > Maybe
>> >> >>> > you
>> >> >>> > can hold the complete history of this topic in your head and
>> >> >>> > comprehend
>> >> >>> > it,
>> >> >>> > but for me it would be very helpful if someone said:
>> >> >>> > - here's my dataset
>> >> >>> > - this is what I want to do with it
>> >> >>> > - this is the best I can do with the current implementation
>> >> >>> > - here's how API X would allow me to solve this better or simpler
>> >> >>> > This can be done much better with actual data and an actual
>> >> >>> > implementation
>> >> >>> > than with a design proposal. You seem to disagree with this
>> >> >>> > statement.
>> >> >>> > That's fine. I would hope though that you recognize that concrete
>> >> >>> > examples
>> >> >>> > help people like me, and construct one or two to help us out.
>> >> >>> That's what use-cases are for in designing APIs.  There are
>> >> >>> examples
>> >> >>> of use in the NEP:
>> >> >>>
>> >> >>>
>> >> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
>> >> >>>
>> >> >>> the alterNEP:
>> >> >>>
>> >> >>> https://gist.github.com/1056379
>> >> >>>
>> >> >>> and my longer email to Travis:
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored
>> >> >>>
>> >> >>> Mark has done a nice job of documentation:
>> >> >>>
>> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html
>> >> >>>
>> >> >>> If you want to understand what the alterNEP case is, I'd suggest
>> >> >>> the
>> >> >>> email, just because it's the most recent and I think the
>> >> >>> terminology
>> >> >>> is slightly clearer.
>> >> >>>
>> >> >>> Doing the same examples on a larger array won't make the point
>> >> >>> easier
>> >> >>> to understand.  The discussion is about what the right concepts
>> >> >>> are,
>> >> >>> and you can help by looking at the snippets of code in those
>> >> >>> documents, and deciding for yourself whether you think the current
>> >> >>> masking / NA implementation seems natural and easy to explain, or
>> >> >>> rather forced and difficult to explain, and then email back trying
>> >> >>> to
>> >> >>> explain your impression (which is not always easy).
>> >> >>
>> >> >> If you seriously believe that looking at a few snippets is as
>> >> >> helpful
>> >> >> and
>> >> >> instructive as being able to play around with them in IPython and
>> >> >> modify
>> >> >> them, then I guess we won't make progress in this part of the
>> >> >> discussion.
>> >> >> You're just telling me to go back and re-read things I'd already
>> >> >> read.
>> >> >
>> >> > The snippets are in ipython or doctest format - aren't they?
>> >>
>> >> Oops - 10 minute rule.  Now I see that you mean that you can't
>> >> experiment with the alternative implementation without working code.
>> >
>> > Indeed.
>> >
>> >>
>> >> That's true, but I am hoping that the difference between - say:
>> >>
>> >> a[0:2] = np.NA
>> >>
>> >> and
>> >>
>> >> a.mask[0:2] = False
>> >>
>> >> would be easy enough to imagine.
>> >
>> > It is in this case. I agree the explicit ``a.mask`` is clearer. This is
>> > a
>> > quite specific point that could be improved in the current
>> > implementation.
>>
>> Thanks - this is helpful.
>
> So was your example.
>>
>> > It doesn't require ripping everything out.
>>
>> Nathaniel wasn't proposing 'ripping everything out' - but backing off
>> until consensus has been reached.  That's different.
>
> I'm worried that in practice it won't be different. If you put such a large
> amount of code in a branch, with no one lined up to work on
> changing/improving/re-integrating it, the most likely thing to happen is
> that it will just sit there in a branch, bitrot and eventually be lost.
>
>>
>> If you think we should not do that, and you are interested, please say
>> why.
>> Second - I was proposing that we do indeed keep the code in the
>> codebase but discuss adaptations that could achieve consensus.
>
> Glad to hear it. This is not what I understood from the email you linked to
> earlier. Quoting: "Honestly, I think that NA should be a synonym for ABSENT,
> and so should be removed until the dust has settled, and restored as (np.NA
> == np.ABSENT)".

I was proposing that the name 'np.NA' should be removed, leaving
np.IGNORED (with the same meaning as the current np.NA) and np.ABSENT
currently not implemented.  When it does get implemented, then, in due
course, make np.NA a synonym for np.ABSENT.  I'm sorry that wasn't
obvious.

> At this point I care much more about having a good implementation than
> exactly which one; the similarities are much more important the differences.
> My main worry is we end up with nothing.

I don't think any proposed route ended up with nothing.  Nathaniel was
only suggesting backing off until we had done the work of agreeing.
It doesn't look like that has much support; that's fine.

Best,

Matthew


More information about the NumPy-Discussion mailing list