[Numpy-discussion] Numpy governance update
Thu Feb 16 12:09:27 CST 2012
On Thu, Feb 16, 2012 at 12:53 PM, Charles R Harris
> On Thu, Feb 16, 2012 at 9:56 AM, Nathaniel Smith <email@example.com> wrote:
>> On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn
>> <firstname.lastname@example.org> wrote:
>> > If non-contributing users came along on the Cython list demanding that
>> > we set up a system to select non-developers along on a board that would
>> > have discussions in order to veto pull requests, I don't know whether
>> > we'd ignore it or ridicule it or try to show some patience, but we
>> > certainly wouldn't take it seriously.
>> I'm not really worried about the Continuum having some nefarious
>> "corporate" intent. But I am worried about how these plans will affect
>> numpy, and I think there serious risks if we don't think about
>> process. Money has a dramatic effect on FOSS development, and not
>> always in a positive way, even when -- or *especially* when --
>> everyone has the best of intentions. I'm actually *more* worried about
>> altruistic full-time developers doing work on behalf of the community
>> than I am about developers who are working strictly in some company's
>> Finding a good design for software is like a nasty optimization
>> problem -- it's easy to get stuck in local maxima, and any one person
>> has only an imperfect, noisy estimate of the objective function. So
>> you need lots of eyes to catch mistakes, filter out the noise, and
>> explore multiple maxima in parallel.
>> The classic FOSS model of volunteer developers who are in charge of
>> project direction does a *great* job of solving this problem. (Linux
>> beat all the classic Unixen on technical quality, and it did it using
>> college students and volunteers -- it's not like Sun, IBM, HP etc.
>> couldn't afford better engineers! But they still lost.) Volunteers are
>> intimately familiar with the itch they're trying to scratch and the
>> trade-offs involved in doing so, and they need to work together to
>> produce anything major, so you get lots of different, high-quality
>> perspectives to help you figure out which approach is best.
> Linux is probably a bad choice as example here. Right up to about 2002 Linus
> was pretty much the only entry point into mainline as he applied all the
> patches by hand and reviewed all of them. This of course slowed Linux
> development considerably. I also had the opportunity to fix up some of the
> drivers for my own machine and can testify that the code quality of the
> patches was mixed. Now, of course, with 10000 or more patches going in
> during the open period of each development cycle, Linus relies on
> lieutenants to handle the subsystems, but he can be damn scathing when he
> takes an interest in some code and doesn't like what he sees. And he *can*
> be scathing, not just because he started the whole thing, but because he is
> darn good and the other developers respect that. But my point here is that
> Linus pretty much shapes Linux.
>> Developers who are working for some corporate interest alter this
>> balance, because in a "do-ocracy", someone who can throw a few
>> full-time developers at something suddenly is suddenly has effectively
>> complete control over project direction. There's no moral problem here
>> when the "dictator" is benevolent, but suddenly you have an
>> informational bottleneck -- even benevolent dictators make mistakes,
>> and they certainly aren't omniscient. Even this isn't *so* bad though,
>> so long as the corporation is scratching their own itch -- at least
>> you can be pretty sure that whatever they produce will at least make
>> them happy, which implies a certain level of utility.
> Linus deals with this by saying, fork, fork, fork. Of course the gpl makes
> that a more viable response.
>> The riskiest case is paying developers to scratch someone else's itch.
>> IIUC, that's a major goal of Travis's here, to find a way to pay
>> developers to make numpy better for everyone. But, now you need some
>> way for the community to figure out what "better" means, because the
>> developers themselves don't necessarily know. It's not their itch
>> anymore. Running a poll or whatever might be a nice start, but we all
>> know how tough it is to extract useful design information from users.
>> You need a lot more than that if you want to keep the quality up.
>> Travis's proposal is that we go from a large number of self-selecting
>> people putting in little bits of time to a small number of designated
>> people putting in lots of time. There's a major win in terms of total
>> effort, but you inevitably lose a lot of diversity of viewpoints. My
>> feeling is it will only be a net win if the new employees put serious,
>> bend-over-backwards effort into taking advantage of the volunteer
>> community's wisdom.
>> This is why the NA discussion seems so relevant to me here -- everyone
>> involved absolutely had good intentions, excellent skills, etc., and
>> yet the outcome is still a huge unresolved mess. It was supposed to
>> make numpy more attractive for a certain set of applications, like
>> statistical analysis, where R is currently preferred. Instead, there
>> have been massive changes merged into numpy mainline, but most of the
>> intended "target market" for these changes is indifferent to them;
>> they don't solve the problem they're supposed to. And along the way
>> we've not just spent a bunch of Enthought's money, but also wasted
>> dozens of hours of volunteer time while seriously alienating some of
>> numpy's most dedicated advocates in that "target market". We could
>> debate about blame, and I'm sure there's plenty to spread around, but
>> I also think the fundamental problem isn't one of blame at all -- it's
>> that Mark, Charles and Travis *aren't* scratching an itch; AFAICT the
>> NA functionality is not something they actually need themselves. Which
>> means they're fighting uphill when trying to find the best solutions,
>> and haven't managed it yet. And were working on a deadline, to boot.
>> > It's obvious that one should try for consensus as long as possible,
>> > including listening to users. But in the very end, when agreement can't
>> > be reached by other means, the developers are the one making the calls.
>> > (This is simply a consequence that they are the only ones who can
>> > credibly threaten to fork the project.)
>> > Sure, structures that includes users in the process could be useful...
>> > but, if the devs are fine with the current situation (and I don't see
>> > Mark or Charles complaining), then I honestly think it is quite rude to
>> > not let the matter drop after the first ten posts or so.
>> I'm not convinced we need a formal governing body, but I think we
>> really, really need a community norm that takes consensus *very*
>> seriously. That principle is more important than who exactly enforces
>> it. I guess people are worried about that turning into obstructionism
>> or something, but seriously, this is a practical approach that works
>> well for lots of real actual successful FOSS projects.
>> I think it's also worth distinguishing between "users" and "developers
>> who happen not to be numpy core developers". There are lots of
>> experienced and skilled developers who spend their time on, say, scipy
>> or nipy or whatever, just because numpy already works for them. That
>> doesn't mean they don't have valuable insights or a stake in how numpy
>> develops going forward!
>> IMHO, everyone who can credibly participate in the technical
>> discussion should have a veto -- and should almost never use it. And
>> yes, that means volunteers should be able to screw up corporate
>> schedules if that's what's best for numpy-the-project. And, to be
>> clear, I'm not saying that random list-members somehow *deserve* to
>> screw around with generous corporate endowments; I'm saying that the
>> people running the corporation are going to be a lot happier in the
>> long run if they impose this rule on themselves.
> I'm more for the Linux model, Linus rules, the rest grovel ;)
I would feel a lot more comfortable with a BDFL that has code coverage
and ABI consistency high on his priority, and not just getting the
greatest new features in as fast as possible.
numpy is quite a bit more in use now than xx years ago.
someone who learned numpy and scipy by working through bugs.
> NumPy-Discussion mailing list
More information about the NumPy-Discussion