[Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Charles R Harris
Fri Oct 28 17:14:53 CDT 2011
On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett <firstname.lastname@example.org>wrote:
> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett <email@example.com>
> > Hi,
> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris
> > <firstname.lastname@example.org> wrote:
> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <email@example.com> wrote:
> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <
> >>> wrote:
> >>> > I think Nathaniel and Matthew provided very
> >>> > specific feedback that was helpful in understanding other
> >>> > of a
> >>> > difficult problem. In particular, I really wanted bit-patterns
> >>> > implemented. However, I also understand that Mark did quite a bit
> >>> > work
> >>> > and altered his original designs quite a bit in response to community
> >>> > feedback. I wasn't a major part of the pull request discussion, nor
> >>> > did I
> >>> > merge the changes, but I support Charles if he reviewed the code and
> >>> > felt
> >>> > like it was the right thing to do. I likely would have done the same
> >>> > thing
> >>> > rather than let Mark Wiebe's work languish.
> >>> My connectivity is spotty this week, so I'll stay out of the technical
> >>> discussion for now, but I want to share a story.
> >>> Maybe a year ago now, Jonathan Taylor and I were debating what the
> >>> best API for describing statistical models would be -- whether we
> >>> wanted something like R's "formulas" (which I supported), or another
> >>> approach based on sympy (his idea). To summarize, I thought his API
> >>> was confusing, pointlessly complicated, and didn't actually solve the
> >>> problem; he thought R-style formulas were superficially simpler but
> >>> hopelessly confused and inconsistent underneath. Now, obviously, I was
> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it
> >>> wasn't like I could just wave a wand and make his arguments go away,
> >>> no matter how annoying and wrong-headed I thought they were... I could
> >>> write all the code I wanted but no-one would use it unless I could
> >>> convince them it's actually the right solution, so I had to engage
> >>> with him, and dig deep into his arguments.
> >>> What I discovered was that (as I thought) R-style formulas *do* have a
> >>> solid theoretical basis -- but (as he thought) all the existing
> >>> implementations *are* broken and inconsistent! I'm still not sure I
> >>> can actually convince Jonathan to go my way, but, because of his
> >>> stubbornness, I had to invent a better way of handling these formulas,
> >>> and so my library is actually the first implementation of these
> >>> things that has a rigorous theory behind it, and in the process it
> >>> avoids two fundamental, decades-old bugs in R. (And I'm not sure the R
> >>> folks can fix either of them at this point without breaking a ton of
> >>> code, since they both have API consequences.)
> >>> --
> >>> It's extremely common for healthy FOSS projects to insist on consensus
> >>> for almost all decisions, where consensus means something like "every
> >>> interested party has a veto". This seems counterintuitive, because
> >>> if everyone's vetoing all the time, how does anything get done? The
> >>> trick is that if anyone *can* veto, then vetoes turn out to actually
> >>> be very rare. Everyone knows that they can't just ignore alternative
> >>> points of view -- they have to engage with them if they want to get
> >>> anything done. So you get buy-in on features early, and no vetoes are
> >>> necessary. And by forcing people to engage with each other, like me
> >>> with Jonathan, you get better designs.
> >>> But what about the cost of all that code that doesn't get merged, or
> >>> written, because everyone's spending all this time debating instead?
> >>> Better designs are nice and all, but how does that justify letting
> >>> working code languish?
> >>> The greatest risk for a FOSS project is that people will ignore you.
> >>> Projects and features live and die by community buy-in. Consider the
> >>> "NA mask" feature right now. It works (at least the parts of it that
> >>> are implemented). It's in mainline. But IIRC, Pierre said last time
> >>> that he doesn't think the current design will help him improve or
> >>> replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring
> >>> this feature in favor of his library pandas' current hacky NA support.
> >>> Members of the neuroimaging crowd are saying that the memory overhead
> >>> is too high and the benefits too marginal, so they'll stick with NaNs.
> >>> Together these folk a huge proportion of the this feature's target
> >>> audience. So what have we actually accomplished by merging this to
> >>> mainline? Are we going to be stuck supporting a feature that only a
> >>> fraction of the target audience actually uses? (Maybe they're being
> >>> dumb, but if people are ignoring your code for dumb reasons... they're
> >>> still ignoring your code.)
> >>> The consensus rule forces everyone to do the hardest and riskiest part
> >>> -- building buy-in -- up front. Because you *have* to do it sooner or
> >>> later, and doing it sooner doesn't just generate better designs. It
> >>> drastically reduces the risk of ending up in a huge trainwreck.
> >>> --
> >>> In my story at the beginning, I wished I had a magic wand to skip this
> >>> annoying debate and political stuff. But giving it to me would have
> >>> been a bad idea. I think that's went wrong with the NA discussion in
> >>> the first place. Mark's an excellent programmer, and he tried his best
> >>> to act in the good of everyone in the project -- but in the end, he
> >>> did have a wand like that. He didn't have that sense that he *had* to
> >>> get everyone on board (even the people who were saying dumb things),
> >>> or he'd just be wasting his time. He didn't ask Pierre if the NA
> >>> design would actually work for numpy.ma's purposes -- I did.
> >>> You may have noticed that I do have some ideas for about how NA
> >>> support should work. But my ideas aren't really the important thing.
> >>> The alter-NEP was my attempt to find common ground between the
> >>> different needs people were bringing up, so we could discuss whether
> >>> it would work for people or not. I'm not wedded to anything in it. But
> >>> this is a complicated issue with a lot of conflicting interests, and
> >>> we need to find something that actually does work for everyone (or as
> >>> large a subset as is practical).
> >>> So here's what I think we should do:
> >>> 1) I will submit a pull request backing Mark's NA work out of
> >>> mainline, for now. (This is more or less done, I just need to get it
> >>> onto github, see above re: connectivity)
> >>> 2) I will also put together a new branch containing that work,
> >>> rebased against current mainline, so it doesn't get lost. (Ditto.)
> >>> 3) And we'll decide what to do with it *after* we hammer out a
> >>> design that the various NA-supporting groups all find convincing. Or
> >>> at least a design for some of the less controversial pieces (like the
> >>> 'where=' ufunc argument?), get those merged, and then iterate
> >>> incrementally.
> >>> What do you all think?
> >> Why don't you and Matthew work up an alternative implementation so we
> >> compare the two?
> > Do you have comments on the changes I suggested?
> Sorry - this was too short and a little rude. I'm sorry.
> I was reacting to what I perceived as intolerance for discussing the
> issues, and I may be wrong in that perception.
> I think what Nathaniel is saying, is that it is not in the best
> interests of numpy to push through code where there is not good
> agreement. In reverting the change, he is, I think, appealing for a
> commitment to that process, for the good of numpy.
> I have in the past taken some of your remarks to imply that if someone
> is prepared to write code then that overrides most potential
> The reason I think Nathaniel is the more right, is because most of us,
> I believe, do honestly have the interests of numpy at heart, and, want
> to fully understand the problem, and are prepared to be proven wrong.
> In that situation, in my experience of writing code at least, by far
> the most fruitful way to proceed is by letting all voices be heard.
> On the other hand, if the rule becomes 'unless I see an implementation
> I'm not listening to you' - then we lose the great benefits, to the
> code, of having what is fundamentally a good and strong community.
Matthew, the problem I have is that it seems that you and Nathaniel won't be
satisfied unless things are done *your* way. To use your terminology, that
comes across as a lack of respect for the rest of us. In order to reach
consensus, some folks are going to have to give. I think Mark gave a lot, I
don't see that from the two of you. Wanting reversion at this point, even
when Nathaniel doesn't seem to have used the current implementation much --
if any -- might be considered arrogant by some. Asking that you put some
skin in the game by devoting substantial time to an alternate implementation
doesn't strike me as out of line.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion