[Numpy-discussion] What is consensus anyway
Sun Apr 22 17:15:04 CDT 2012
If you hang around big FOSS projects, you'll see the word "consensus"
come up a lot. For example, the glibc steering committee recently
dissolved itself in favor of governance "directly by the consensus of
the people active in glibc development". It's the governing rule of
the IETF, which defines many of the most important internet
standards. It is the "primary way decisions are made on
Wikipedia". It's "one of the fundamental aspects of accomplishing
things within the Apache framework".
But it turns out that this "consensus" thing is actually somewhat
mysterious, and one that most programmers immersed in this culture
pick it up by osmosis. And numpy in particular has a lot of developers
who are not coming from a classic FOSS programmer background! So this
is my personal attempt to articulate what it is, and why requiring
consensus is probably the best possible approach to project decision
So what is "consensus"? Like, voting or something?
This is surprisingly subtle and specific.
"Consensus" means something like, "everyone who cares is satisfied
with the result".
It does *not* mean
* Every opinion counts equally
* We vote on anything
* Every solution must be perfect and flawless
* Every solution must leave everyone overjoyed
* Everyone must sign off on every solution.
It *does* mean
* We invite people to speak up
* We generally trust individuals to decide how important their opinion is
* We generally trust individuals to decide whether or not they can
live with some outcome
* If they can't, then we take the time to find something better.
One simple way of stating this is, everyone has a veto. In practice,
such vetoes are almost never used, so this rule is not particularly
illuminating on its own. Hence, the rest of this document.
What a waste of time! That all sounds very pretty on paper, but we
have stuff to get done.
First, I'll note that this seemingly utopian scheme has a track record
of producing such impractical systems as TCP/IP, SMTP, DNS, Apache,
GCC, Linux, Samba, Python, ...
But mere empirical results are often less convincing than a good
story, so I will give you two. Why does a requirement for consensus
Reason 1 (for optimists): *All of us are smarter than any of us.* For
a complex project with many users, it's extraordinarily difficult for
any one person to understand the full ramifications of any decision,
particularly the sort of far-reaching architectural decisions that are
most important. It's even more difficult to understand all the
possibilities of all the different possible solutions. In fact, it's
*extremely* common that the correct solution to a problem is the one
that no-one thinks of until after a month of annoying debate. Spending
a month to avoid an architectural problem that will haunt us for years
is an *excellent* trade-off, even if it feels interminable at the
time. Even two months. Usually disagreements are an indication that a
better solution is possible, even when it's not clear what that would
Reason 2 (for pessimists): *You **will** reach consensus sooner or
later; it's less painful to do up front.* Example: NA handling. There
are two schemes that people use for this right now -- numpy.ma and
ugly NaN kluges (see e.g. nanmean). These are generally agreed to be
suboptimal. Recently, two new contenders have shown up: the NEP
masked-NA support currently in master, and the unrelated NA support in
pandas (which as a library is attracting a *lot* of the statistical
analysis folk who care about missing data, kudos to Wes). I think that
right now, the most likely future is that a few years from now, many
people will still be using the old solutions, and others will have
switched to the new (incompatible) solutions, and we will have *4*
suboptimal schemes in concurrent use. If (when) this happens, we will
have to re-open this discussion yet again, but now with a heck of a
mess to clean up. This is FOSS -- if people aren't convinced by your
solution, they will just ignore it and do their own thing. So a policy
that allows changes to be made without consensus is a recipe for
entrenching disagreements and splitting the community.
Okay, great, but even if it's the best thing ever, we *can't* hold a
vote on every change! What are you actually suggesting we do?
Right, that's not the idea. Most changes are pretty obviously
uncontroversial, and in fact we usually have the opposite problem --
it's hard to get people to do code review!
So having consensus on every change is an ideal, and in practice, just
following the reasonable person principle lets us get pretty close to
that ideal. If no-one objects to a change, then it's probably fine.
(And it's not like anyone *wants* that segfault to remain unfixed!
That's obvious.) OTOH, this isn't an excuse to try gaming the system
-- if you have a change that might affect people adversely, then it
can be worthwhile to send them a ping, even if they didn't object.
They might have just missed seeing it go by, and if it's going to be a
problem, better to find out now!
In fact, one of the nice things about having a consistent culture of
consensus-building is that people learn to trust that if they do have
a problem, it will be taken seriously. And that, in turn, makes it
okay to make judgement calls about whether the participants in some
discussion basically agree, or whether the apparent disagreement is
just bikeshedding, or whatever. If you make the wrong judgement
call, then someone will tell you, and no harm is done. If you do not
have such a culture, then people may (will) despair of being taken
seriously, and go do something more pleasant and productive, like
fire-walking or coding in Matlab.
So mainly what I'm saying we should do is:
1. Make it as easy as possible for people to see what's going on and
join the discussion. All decisions and reasoning behind decisions take
place in public. (On this note, it would be *really* good if pull
request notifications went to the list.)
2. If someone raises a substantive objection, take that seriously.
3. If someone says "no, this is just not going to work for me,
because... <something substantive here>", then it can't go in.
It turns out that since we are all human, it's much easier to take
people's concerns seriously when you know that they can veto your
code! The result is that in practice, disagreements get resolved at
point (2) there, and no-one feels the need to take such extreme
measures as vetoing anything. I'm as lazy as anyone else; I produce
better solutions when I'm forced to keep looking for them.
So if we just follow this consensus stuff, everything will be perfect?
Ha ha ha no. It can and does work better than any other options I'm
aware of, but it takes practice and there are certainly still failure
modes. Here's some that come to mind:
* Bikeshedding: Mentioned above. There's nothing *wrong* with everyone
speaking up to voice their opinion, but it shouldn't be an obstacle to
getting work done when in fact everyone will be satisfied regardless.
* False compromise: if all you're trying to do is make everybody
happy, then it's easy to end up with a "compromise" that takes one bit
from each proposal (or simply takes the union of them all). This is
usually worse than any of the original proposals. Good designs take
work; skipping that work is tempting; resist it.
You also can end up with some surprising solutions, e.g.:
* The group reaches consensus that the problem is well understood and
there simply is no perfect solution, but something is better than
nothing. So everyone agrees to, in effect, flip a coin. (Example: the
Python ternary operator)
* The group reaches consensus that while other points of view may be
valid, this project only has room for one of them, sorry, anyone who
disagrees will have to start their own project. (Example: GPL versus
* The group reaches consensus that at least one piece of a larger
proposal is okay to start, which puts off the show-down over the rest
of it until another day. (At which point more data more be available,
opinions may have changed, or the whole issue may have become
Remember, the goal is always to find some way forward that we can
collectively live with. Sometimes you can successfully convince
everyone of the intrinsic awesomeness of your original idea through
argument along... but clever outside-the-box proposals often pay-off
But what about obstructive people abusing their veto power?
This concern makes perfect sense, but it turns out to just not be as
much of a problem as often as you'd think. Most people have more
interesting things to do with their lives than to gum up the mailing
list for some random software library. It's a fair assumption that if
someone cares enough to speak up, it's because they have some
legitimate interest in numpy's future. And again, energy spent on
trying to sniff out obstructionists can usually be more profitably
spent on finding better solutions.
That said, yes, sometimes people may be obstructive or act in bad
faith. Here's some good experience-based advice on this:
Notice that everything they say is still oriented around consensus --
"You may not persuade the person in question, but that's okay as long
as you persuade everyone else", "a perfect example of how to build a
strong case on neutral, quantitative data", the "masterful strategy"
is to build consensus before acting. The consensus ideal is perfectly
compatible with dealing with difficult people.
Like I said at the start, this is just my attempt to distill some
abstract principles from my own experience. I can't take credit for
most of these insights, and no doubt I've articulated some of them
poorly. Fortunately, we don't all have to agree on every detail to get
things done :-). But if you want to read more about this topic, and
from other perspectives, here are some decent documents:
And, of course, I would love to hear feedback on this document!
More information about the NumPy-Discussion