[Numpy-discussion] consensus (was: NA masks in the next numpy release?)
Charles R Harris
Fri Oct 28 16:41:57 CDT 2011
On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <firstname.lastname@example.org> wrote:
> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <email@example.com>
> > I think Nathaniel and Matthew provided very
> > specific feedback that was helpful in understanding other perspectives of
> > difficult problem. In particular, I really wanted bit-patterns
> > implemented. However, I also understand that Mark did quite a bit of
> > and altered his original designs quite a bit in response to community
> > feedback. I wasn't a major part of the pull request discussion, nor did
> > merge the changes, but I support Charles if he reviewed the code and felt
> > like it was the right thing to do. I likely would have done the same
> > rather than let Mark Wiebe's work languish.
> My connectivity is spotty this week, so I'll stay out of the technical
> discussion for now, but I want to share a story.
> Maybe a year ago now, Jonathan Taylor and I were debating what the
> best API for describing statistical models would be -- whether we
> wanted something like R's "formulas" (which I supported), or another
> approach based on sympy (his idea). To summarize, I thought his API
> was confusing, pointlessly complicated, and didn't actually solve the
> problem; he thought R-style formulas were superficially simpler but
> hopelessly confused and inconsistent underneath. Now, obviously, I was
> right and he was wrong. Well, obvious to me, anyway... ;-) But it
> wasn't like I could just wave a wand and make his arguments go away,
> no matter how annoying and wrong-headed I thought they were... I could
> write all the code I wanted but no-one would use it unless I could
> convince them it's actually the right solution, so I had to engage
> with him, and dig deep into his arguments.
> What I discovered was that (as I thought) R-style formulas *do* have a
> solid theoretical basis -- but (as he thought) all the existing
> implementations *are* broken and inconsistent! I'm still not sure I
> can actually convince Jonathan to go my way, but, because of his
> stubbornness, I had to invent a better way of handling these formulas,
> and so my library is actually the first implementation of these
> things that has a rigorous theory behind it, and in the process it
> avoids two fundamental, decades-old bugs in R. (And I'm not sure the R
> folks can fix either of them at this point without breaking a ton of
> code, since they both have API consequences.)
> It's extremely common for healthy FOSS projects to insist on consensus
> for almost all decisions, where consensus means something like "every
> interested party has a veto". This seems counterintuitive, because
> if everyone's vetoing all the time, how does anything get done? The
> trick is that if anyone *can* veto, then vetoes turn out to actually
> be very rare. Everyone knows that they can't just ignore alternative
> points of view -- they have to engage with them if they want to get
> anything done. So you get buy-in on features early, and no vetoes are
> necessary. And by forcing people to engage with each other, like me
> with Jonathan, you get better designs.
> But what about the cost of all that code that doesn't get merged, or
> written, because everyone's spending all this time debating instead?
> Better designs are nice and all, but how does that justify letting
> working code languish?
> The greatest risk for a FOSS project is that people will ignore you.
> Projects and features live and die by community buy-in. Consider the
> "NA mask" feature right now. It works (at least the parts of it that
> are implemented). It's in mainline. But IIRC, Pierre said last time
> that he doesn't think the current design will help him improve or
> replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring
> this feature in favor of his library pandas' current hacky NA support.
> Members of the neuroimaging crowd are saying that the memory overhead
> is too high and the benefits too marginal, so they'll stick with NaNs.
> Together these folk a huge proportion of the this feature's target
> audience. So what have we actually accomplished by merging this to
> mainline? Are we going to be stuck supporting a feature that only a
> fraction of the target audience actually uses? (Maybe they're being
> dumb, but if people are ignoring your code for dumb reasons... they're
> still ignoring your code.)
> The consensus rule forces everyone to do the hardest and riskiest part
> -- building buy-in -- up front. Because you *have* to do it sooner or
> later, and doing it sooner doesn't just generate better designs. It
> drastically reduces the risk of ending up in a huge trainwreck.
> In my story at the beginning, I wished I had a magic wand to skip this
> annoying debate and political stuff. But giving it to me would have
> been a bad idea. I think that's went wrong with the NA discussion in
> the first place. Mark's an excellent programmer, and he tried his best
> to act in the good of everyone in the project -- but in the end, he
> did have a wand like that. He didn't have that sense that he *had* to
> get everyone on board (even the people who were saying dumb things),
> or he'd just be wasting his time. He didn't ask Pierre if the NA
> design would actually work for numpy.ma's purposes -- I did.
> You may have noticed that I do have some ideas for about how NA
> support should work. But my ideas aren't really the important thing.
> The alter-NEP was my attempt to find common ground between the
> different needs people were bringing up, so we could discuss whether
> it would work for people or not. I'm not wedded to anything in it. But
> this is a complicated issue with a lot of conflicting interests, and
> we need to find something that actually does work for everyone (or as
> large a subset as is practical).
> So here's what I think we should do:
> 1) I will submit a pull request backing Mark's NA work out of
> mainline, for now. (This is more or less done, I just need to get it
> onto github, see above re: connectivity)
> 2) I will also put together a new branch containing that work,
> rebased against current mainline, so it doesn't get lost. (Ditto.)
> 3) And we'll decide what to do with it *after* we hammer out a
> design that the various NA-supporting groups all find convincing. Or
> at least a design for some of the less controversial pieces (like the
> 'where=' ufunc argument?), get those merged, and then iterate
> What do you all think?
Why don't you and Matthew work up an alternative implementation so we can
compare the two?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion