[Numpy-discussion] consensus (was: NA masks in the next numpy release?)

Matthew Brett matthew.brett@gmail....
Fri Oct 28 18:37:42 CDT 2011


Hi,

On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers
<ralf.gommers@googlemail.com> wrote:
>
>
> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett <matthew.brett@gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris
>> <charlesr.harris@gmail.com> wrote:
>> >
>> >
>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <matthew.brett@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett
>> >> <matthew.brett@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris
>> >> > <charlesr.harris@gmail.com> wrote:
>> >> >>
>> >> >>
>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <njs@pobox.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant
>> >> >>> <oliphant@enthought.com>
>> >> >>> wrote:
>> >> >>> > I think Nathaniel and Matthew provided very
>> >> >>> > specific feedback that was helpful in understanding other
>> >> >>> > perspectives
>> >> >>> > of a
>> >> >>> > difficult problem.     In particular, I really wanted
>> >> >>> > bit-patterns
>> >> >>> > implemented.    However, I also understand that Mark did quite a
>> >> >>> > bit
>> >> >>> > of
>> >> >>> > work
>> >> >>> > and altered his original designs quite a bit in response to
>> >> >>> > community
>> >> >>> > feedback.   I wasn't a major part of the pull request discussion,
>> >> >>> > nor
>> >> >>> > did I
>> >> >>> > merge the changes, but I support Charles if he reviewed the code
>> >> >>> > and
>> >> >>> > felt
>> >> >>> > like it was the right thing to do.  I likely would have done the
>> >> >>> > same
>> >> >>> > thing
>> >> >>> > rather than let Mark Wiebe's work languish.
>> >> >>>
>> >> >>> My connectivity is spotty this week, so I'll stay out of the
>> >> >>> technical
>> >> >>> discussion for now, but I want to share a story.
>> >> >>>
>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what the
>> >> >>> best API for describing statistical models would be -- whether we
>> >> >>> wanted something like R's "formulas" (which I supported), or
>> >> >>> another
>> >> >>> approach based on sympy (his idea). To summarize, I thought his API
>> >> >>> was confusing, pointlessly complicated, and didn't actually solve
>> >> >>> the
>> >> >>> problem; he thought R-style formulas were superficially simpler but
>> >> >>> hopelessly confused and inconsistent underneath. Now, obviously, I
>> >> >>> was
>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it
>> >> >>> wasn't like I could just wave a wand and make his arguments go
>> >> >>> away,
>> >> >>> no matter how annoying and wrong-headed I thought they were... I
>> >> >>> could
>> >> >>> write all the code I wanted but no-one would use it unless I could
>> >> >>> convince them it's actually the right solution, so I had to engage
>> >> >>> with him, and dig deep into his arguments.
>> >> >>>
>> >> >>> What I discovered was that (as I thought) R-style formulas *do*
>> >> >>> have a
>> >> >>> solid theoretical basis -- but (as he thought) all the existing
>> >> >>> implementations *are* broken and inconsistent! I'm still not sure I
>> >> >>> can actually convince Jonathan to go my way, but, because of his
>> >> >>> stubbornness, I had to invent a better way of handling these
>> >> >>> formulas,
>> >> >>> and so my library[1] is actually the first implementation of these
>> >> >>> things that has a rigorous theory behind it, and in the process it
>> >> >>> avoids two fundamental, decades-old bugs in R. (And I'm not sure
>> >> >>> the R
>> >> >>> folks can fix either of them at this point without breaking a ton
>> >> >>> of
>> >> >>> code, since they both have API consequences.)
>> >> >>>
>> >> >>> --
>> >> >>>
>> >> >>> It's extremely common for healthy FOSS projects to insist on
>> >> >>> consensus
>> >> >>> for almost all decisions, where consensus means something like
>> >> >>> "every
>> >> >>> interested party has a veto"[2]. This seems counterintuitive,
>> >> >>> because
>> >> >>> if everyone's vetoing all the time, how does anything get done? The
>> >> >>> trick is that if anyone *can* veto, then vetoes turn out to
>> >> >>> actually
>> >> >>> be very rare. Everyone knows that they can't just ignore
>> >> >>> alternative
>> >> >>> points of view -- they have to engage with them if they want to get
>> >> >>> anything done. So you get buy-in on features early, and no vetoes
>> >> >>> are
>> >> >>> necessary. And by forcing people to engage with each other, like me
>> >> >>> with Jonathan, you get better designs.
>> >> >>>
>> >> >>> But what about the cost of all that code that doesn't get merged,
>> >> >>> or
>> >> >>> written, because everyone's spending all this time debating
>> >> >>> instead?
>> >> >>> Better designs are nice and all, but how does that justify letting
>> >> >>> working code languish?
>> >> >>>
>> >> >>> The greatest risk for a FOSS project is that people will ignore
>> >> >>> you.
>> >> >>> Projects and features live and die by community buy-in. Consider
>> >> >>> the
>> >> >>> "NA mask" feature right now. It works (at least the parts of it
>> >> >>> that
>> >> >>> are implemented). It's in mainline. But IIRC, Pierre said last time
>> >> >>> that he doesn't think the current design will help him improve or
>> >> >>> replace numpy.ma. Up-thread, Wes McKinney is leaning towards
>> >> >>> ignoring
>> >> >>> this feature in favor of his library pandas' current hacky NA
>> >> >>> support.
>> >> >>> Members of the neuroimaging crowd are saying that the memory
>> >> >>> overhead
>> >> >>> is too high and the benefits too marginal, so they'll stick with
>> >> >>> NaNs.
>> >> >>> Together these folk a huge proportion of the this feature's target
>> >> >>> audience. So what have we actually accomplished by merging this to
>> >> >>> mainline? Are we going to be stuck supporting a feature that only a
>> >> >>> fraction of the target audience actually uses? (Maybe they're being
>> >> >>> dumb, but if people are ignoring your code for dumb reasons...
>> >> >>> they're
>> >> >>> still ignoring your code.)
>> >> >>>
>> >> >>> The consensus rule forces everyone to do the hardest and riskiest
>> >> >>> part
>> >> >>> -- building buy-in -- up front. Because you *have* to do it sooner
>> >> >>> or
>> >> >>> later, and doing it sooner doesn't just generate better designs. It
>> >> >>> drastically reduces the risk of ending up in a huge trainwreck.
>> >> >>>
>> >> >>> --
>> >> >>>
>> >> >>> In my story at the beginning, I wished I had a magic wand to skip
>> >> >>> this
>> >> >>> annoying debate and political stuff. But giving it to me would have
>> >> >>> been a bad idea. I think that's went wrong with the NA discussion
>> >> >>> in
>> >> >>> the first place. Mark's an excellent programmer, and he tried his
>> >> >>> best
>> >> >>> to act in the good of everyone in the project -- but in the end, he
>> >> >>> did have a wand like that. He didn't have that sense that he *had*
>> >> >>> to
>> >> >>> get everyone on board (even the people who were saying dumb
>> >> >>> things),
>> >> >>> or he'd just be wasting his time. He didn't ask Pierre if the NA
>> >> >>> design would actually work for numpy.ma's purposes -- I did.
>> >> >>>
>> >> >>> You may have noticed that I do have some ideas for about how NA
>> >> >>> support should work. But my ideas aren't really the important
>> >> >>> thing.
>> >> >>> The alter-NEP was my attempt to find common ground between the
>> >> >>> different needs people were bringing up, so we could discuss
>> >> >>> whether
>> >> >>> it would work for people or not. I'm not wedded to anything in it.
>> >> >>> But
>> >> >>> this is a complicated issue with a lot of conflicting interests,
>> >> >>> and
>> >> >>> we need to find something that actually does work for everyone (or
>> >> >>> as
>> >> >>> large a subset as is practical).
>> >> >>>
>> >> >>> So here's what I think we should do:
>> >> >>>  1) I will submit a pull request backing Mark's NA work out of
>> >> >>> mainline, for now. (This is more or less done, I just need to get
>> >> >>> it
>> >> >>> onto github, see above re: connectivity)
>> >> >>>  2) I will also put together a new branch containing that work,
>> >> >>> rebased against current mainline, so it doesn't get lost. (Ditto.)
>> >> >>>  3) And we'll decide what to do with it *after* we hammer out a
>> >> >>> design that the various NA-supporting groups all find convincing.
>> >> >>> Or
>> >> >>> at least a design for some of the less controversial pieces (like
>> >> >>> the
>> >> >>> 'where=' ufunc argument?), get those merged, and then iterate
>> >> >>> incrementally.
>> >> >>>
>> >> >>> What do you all think?
>> >> >>>
>> >> >>
>> >> >> Why don't you and Matthew work up an alternative implementation so
>> >> >> we
>> >> >> can
>> >> >> compare the two?
>> >> >
>> >> > Do you have comments on the changes I suggested?
>> >>
>> >> Sorry - this was too short and a little rude.  I'm sorry.
>> >>
>> >> I was reacting to what I perceived as intolerance for discussing the
>> >> issues, and I may be wrong in that perception.
>> >>
>> >> I think what Nathaniel is saying, is that it is not in the best
>> >> interests of numpy to push through code where there is not good
>> >> agreement.  In reverting the change, he is, I think, appealing for a
>> >> commitment to that process, for the good of numpy.
>> >>
>> >> I have in the past taken some of your remarks to imply that if someone
>> >> is prepared to write code then that overrides most potential
>> >> disagreement.
>> >>
>> >> The reason I think Nathaniel is the more right, is because most of us,
>> >> I believe, do honestly have the interests of numpy at heart, and, want
>> >> to fully understand the problem, and are prepared to be proven wrong.
>> >> In that situation, in my experience of writing code at least, by far
>> >> the most fruitful way to proceed is by letting all voices be heard.
>> >> On the other hand, if the rule becomes 'unless I see an implementation
>> >> I'm not listening to you' - then we lose the great benefits, to the
>> >> code, of having what is fundamentally a good and strong community.
>> >>
>> >
>> > Matthew, the problem I have is that it seems that you and Nathaniel
>> > won't be
>> > satisfied unless things are done *your* way. To use your terminology,
>> > that
>> > comes across as a lack of respect for the rest of us. In order to reach
>> > consensus, some folks are going to have to give.
>>
>> No, that's not what Nathaniel and I are saying at all. Nathaniel was
>> pointing to links for projects that care that everyone agrees before
>> they go ahead.
>
> It looked to me like there was a serious intent to come to an agreement, or
> at least closer together. The discussion in the summer was going around in
> circles though, and was too abstract and complex to follow. Therefore Mark's
> choice of implementing something and then asking for feedback made sense to
> me.

I should point out that the implementation hasn't - as far as I can
see - changed the discussion.  The discussion was about the API.
Implementations are useful for agreed APIs because they can point out
where the API does not make sense or cannot be implemented.  In this
case, the API Mark said he was going to implement - he did implement -
at least as far as I can see.  Again, I'm happy to be corrected.

>> In saying that we are insisting on our way, you are saying, implicitly, 'I
>> am not going to negotiate'.
>
> That is only your interpretation. The observation that Mark compromised
> quite a bit while you didn't seems largely correct to me.

The problem here stems from our inability to work towards agreement,
rather than standing on set positions.  I set out what changes I think
would make the current implementation OK.  Can we please, please have
a discussion about those points instead of trying to argue about who
has given more ground.

> That commitment would of course be good. However, even if that were possible
> before writing code and everyone agreed that the ideas of you and Nathaniel
> should be implemented in full, it's still not clear that either of you would
> be willing to write any code. Agreement without code still doesn't help us
> very much.

I'm going to return to Nathaniel's point - it is a highly valuable
thing to set ourselves the target of resolving substantial discussions
by consensus.   The route you are endorsing here is 'implementor
wins'.   We don't need to do it that way.  We're a mature sensible
bunch of adults who can talk out the issues until we agree they are
ready for implementation, and then implement.  That's all Nathaniel is
saying.  I think he's obviously right, and I'm sad that it isn't as
clear to y'all as it is to me.

Best,

Matthew


More information about the NumPy-Discussion mailing list