[Numpy-discussion] Numpy governance update
Thu Feb 16 15:11:19 CST 2012
This has been a clarifying discussion for some people. I'm glad people are speaking up. I believe in the value of consensus and the value of users opinions. I want to make sure that people who use NumPy and haven't yet learned how to contribute, feel like they have a voice. I have always been very open about adding people to the lists that I have influence over and giving people permissions to contribute even when they disagree with me. I recognize the value of multiple points of view.
That is why in addition to creating the company (with a goal to allow at least some people to spend their day-job working on NumPy), I've pushed to organize a Foundation whose essential mission is to make sure that the core tools used for Python in Science stay open, maintained, and available. I will work very hard to do all I can to make these ventures successful. I had thought I would be able to spend more time on NumPy and SciPy over the past 4 years. This did not work out --- which is why I made a career change. All I can point to is my previous work and say thank you to all who have done so much for the communities I have been able to participate in.
I believe in the power of community development, but I also believe in the power of directed development towards solving people's problems in an open market where people can choose to either interact with the provider or find another supplier. Having two organizations that I support helps me direct my energies towards both of those values. I resonate with Linus' s individual leanings. I'm not a big fan of design-by-committee as I haven't seen it be very successful in creating new technologies. It is pretty good at enforcing the status-quo. If I felt like that is what NumPy needed I would be fine with it.
However, I feel that NumPy is going to be surpassed with other solutions if steps are not taken to improve the code-base *and* add new features. I'm very interested in discussions about how this work is to be accomplished. I'm with Mark that I believe this discussion will be more useful in 6 months when we have made it easier for more people to get involved with core code development.
At the end of the day it is about people and what they spend there time doing. Whatever I do, inside or outside the community, people are free to accept or reject. I can only promise to do my best. It's all I ask of everyone I work with.
It is gratifying to see that NumPy has become a well-used project and that there are significant numbers of stake-holders who want to see the project continue to succeed and be useful for them. My goal with the Foundation, with the Company and with my professional life is to see that the growth of Python in Science, Technology, and Data Analysis continues and even accelerates.
My view right now is similar to Mark's in that we don't have enough core developers. Charles and Ralf and Pauli and David before them have done an amazing job at "pruning, cleaning, and maintaining what is there". Obviously, Jim, Perry, Todd, Rick, Konrad, David A., and Paul Dubois have had significant impact before them. NumPy has always been a community project, but it needs some energy put into it. As someone who is intimately familiar with the code-base (having worked on Numeric as early as 1998 as well as been part of many discussions about Scientific Computing on Python), I'm trying to infuse that energy as best I can. NumPy has a chance to be far more than it is. There are people using inferior solutions because of missing features in NumPy and the lack of awareness of how to use NumPy. There are other important use-cases that NumPy is an "almost-there" solution for. As it solves these problems, even more users will come to our community, and there needs to be a way to hear their voice as well.
Just for the record, I don't believe the NA discussion has been finalized. In fact, the NA discussion this summer was one of the factors that led to my decision to put myself back into NumPy development full time --- I just had to figure out how to do it in a way that my family could accept. I think I could have contributed to that discussion as someone who understands both the code and how it is and has been used.
For the next 6-12 months, I am comfortable taking the "benevolent dictator role". During that time, I hope we can find many more core developers and then re-visit the discussion. My view is that design decisions should be a consensus based on current contributors to the code base and major users. To continue to be relevant, NumPy has to serve it's customers. They are the ones who will have the final say. If others feel like they can do better, a fork is an option. I don't want that to happen, but it is the only effective and practical "governance" structure that exists in my mind outside of the self-governance of the people that participate.
I and others that I work with will be working on code that we plan to put into NumPy. Some of this will be bug-fixes, code-cleanup, and new tests. Some of this will be new features --- features that should have been there from the beginning (If I had understood the use-cases then like I do now). Some of this work will be features that have already been proposed and talked about but nobody has stepped up to write the code (group-by, meta-data, labeled-arrays, etc.). All of the major features will be proposed on this list, and we will use the github process.
I do care about code-quality, tests, and maintenance issues. Now that I am putting some actual resources into the project (and not just late-nights and stolen hours from my academic career and full-time job), I can actually put some energy and money behind those things --- along with new features.
We need build-bots and a good issue-tracking system. I have applied to JetBrains for NumPy to have free access to YouTrack and TeamCity. We are looking for machines to host the build-systems on. Some people have approached me already volunteering to help. All of this help will be graciously accepted. The task before us is large, not small. It will require people working together, trusting each other, and looking for ways to find common ground instead of ways to disagree.
No organizational structure can make up for the lack of great people putting their hearts and efforts into a great cause. At the AMS, SIAM, and other scientific meetings, I have seen hundreds of scientists using NumPy to analyze their large data-sets, and needing threading support, labels, units, data-persistence and management, and the ability to perform data-base-like queries on their data. On Wall Street I have seen quants use NumPy to quickly analyze trading data and create systems for risk analysis. In marketing companies, I have seen NumPy get used (somewhat inefficiently) to manage customer lists and do analysis about where to build retail stores. In several other companies, I have watched NumPy get used to analyze data coming from instruments and determine engineering direction based on those data. I have watched NumPy get used by insurance companies trying to charge a more effective premium.
I have also seen people write solutions without NumPy --- because they don't understand the power of array-oriented computing, or have enough math to be comfortable with thinking of an array of data as a single *thing*. I've seen software architectures develop in the data-base world without awareness of NumPy (and it's ancestor's of J, and APL) and seen people struggle with maintaining bulky solutions that could be a few lines of array-code. I've seen people write compilers for Python while ignoring the NumPy use case and then later trying to bolt-it back on.
I finally feel that I've gained enough experience and awareness to know what NumPy can and should be. I recognize that others have contributed to NumPy and SciPy, and I also recognize that people with great skill will want to comment on and proposals. I will do my best to listen. I will encourage those I work with and have any influence over to do the same.
There are literally thousands of use-cases that NumPy can help people with. NumPy needs life-blood to make it go where it needs to go. All of this will happen in the full light of day. People will be free to comment, complain, argue, and ultimately fork if they don't like what we are doing. The NumPy developers will disagree and not everything I want will happen. I've been over-ruled before --- I expect it will happen in the future.
The door is open to all who want to contribute. It remains so. There is a lot that needs to be done. I appreciate the concerns that have been raised and the people that have raised them. My limited energies remain devoted to improving the NumPy code, building Continuum, and building the Foundation to be able to support all of the Python for Science projects that it possibly can. Eventually, perhaps, I can even participate substantially with SciPy again --- where all of this started for me in 1998.
On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote:
> On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn
> <firstname.lastname@example.org> wrote:
>> If non-contributing users came along on the Cython list demanding that
>> we set up a system to select non-developers along on a board that would
>> have discussions in order to veto pull requests, I don't know whether
>> we'd ignore it or ridicule it or try to show some patience, but we
>> certainly wouldn't take it seriously.
> I'm not really worried about the Continuum having some nefarious
> "corporate" intent. But I am worried about how these plans will affect
> numpy, and I think there serious risks if we don't think about
> process. Money has a dramatic effect on FOSS development, and not
> always in a positive way, even when -- or *especially* when --
> everyone has the best of intentions. I'm actually *more* worried about
> altruistic full-time developers doing work on behalf of the community
> than I am about developers who are working strictly in some company's
> Finding a good design for software is like a nasty optimization
> problem -- it's easy to get stuck in local maxima, and any one person
> has only an imperfect, noisy estimate of the objective function. So
> you need lots of eyes to catch mistakes, filter out the noise, and
> explore multiple maxima in parallel.
> The classic FOSS model of volunteer developers who are in charge of
> project direction does a *great* job of solving this problem. (Linux
> beat all the classic Unixen on technical quality, and it did it using
> college students and volunteers -- it's not like Sun, IBM, HP etc.
> couldn't afford better engineers! But they still lost.) Volunteers are
> intimately familiar with the itch they're trying to scratch and the
> trade-offs involved in doing so, and they need to work together to
> produce anything major, so you get lots of different, high-quality
> perspectives to help you figure out which approach is best.
> Developers who are working for some corporate interest alter this
> balance, because in a "do-ocracy", someone who can throw a few
> full-time developers at something suddenly is suddenly has effectively
> complete control over project direction. There's no moral problem here
> when the "dictator" is benevolent, but suddenly you have an
> informational bottleneck -- even benevolent dictators make mistakes,
> and they certainly aren't omniscient. Even this isn't *so* bad though,
> so long as the corporation is scratching their own itch -- at least
> you can be pretty sure that whatever they produce will at least make
> them happy, which implies a certain level of utility.
> The riskiest case is paying developers to scratch someone else's itch.
> IIUC, that's a major goal of Travis's here, to find a way to pay
> developers to make numpy better for everyone. But, now you need some
> way for the community to figure out what "better" means, because the
> developers themselves don't necessarily know. It's not their itch
> anymore. Running a poll or whatever might be a nice start, but we all
> know how tough it is to extract useful design information from users.
> You need a lot more than that if you want to keep the quality up.
> Travis's proposal is that we go from a large number of self-selecting
> people putting in little bits of time to a small number of designated
> people putting in lots of time. There's a major win in terms of total
> effort, but you inevitably lose a lot of diversity of viewpoints. My
> feeling is it will only be a net win if the new employees put serious,
> bend-over-backwards effort into taking advantage of the volunteer
> community's wisdom.
> This is why the NA discussion seems so relevant to me here -- everyone
> involved absolutely had good intentions, excellent skills, etc., and
> yet the outcome is still a huge unresolved mess. It was supposed to
> make numpy more attractive for a certain set of applications, like
> statistical analysis, where R is currently preferred. Instead, there
> have been massive changes merged into numpy mainline, but most of the
> intended "target market" for these changes is indifferent to them;
> they don't solve the problem they're supposed to. And along the way
> we've not just spent a bunch of Enthought's money, but also wasted
> dozens of hours of volunteer time while seriously alienating some of
> numpy's most dedicated advocates in that "target market". We could
> debate about blame, and I'm sure there's plenty to spread around, but
> I also think the fundamental problem isn't one of blame at all -- it's
> that Mark, Charles and Travis *aren't* scratching an itch; AFAICT the
> NA functionality is not something they actually need themselves. Which
> means they're fighting uphill when trying to find the best solutions,
> and haven't managed it yet. And were working on a deadline, to boot.
>> It's obvious that one should try for consensus as long as possible,
>> including listening to users. But in the very end, when agreement can't
>> be reached by other means, the developers are the one making the calls.
>> (This is simply a consequence that they are the only ones who can
>> credibly threaten to fork the project.)
>> Sure, structures that includes users in the process could be useful...
>> but, if the devs are fine with the current situation (and I don't see
>> Mark or Charles complaining), then I honestly think it is quite rude to
>> not let the matter drop after the first ten posts or so.
> I'm not convinced we need a formal governing body, but I think we
> really, really need a community norm that takes consensus *very*
> seriously. That principle is more important than who exactly enforces
> it. I guess people are worried about that turning into obstructionism
> or something, but seriously, this is a practical approach that works
> well for lots of real actual successful FOSS projects.
> I think it's also worth distinguishing between "users" and "developers
> who happen not to be numpy core developers". There are lots of
> experienced and skilled developers who spend their time on, say, scipy
> or nipy or whatever, just because numpy already works for them. That
> doesn't mean they don't have valuable insights or a stake in how numpy
> develops going forward!
> IMHO, everyone who can credibly participate in the technical
> discussion should have a veto -- and should almost never use it. And
> yes, that means volunteers should be able to screw up corporate
> schedules if that's what's best for numpy-the-project. And, to be
> clear, I'm not saying that random list-members somehow *deserve* to
> screw around with generous corporate endowments; I'm saying that the
> people running the corporation are going to be a lot happier in the
> long run if they impose this rule on themselves.
> -- Nathaniel
> NumPy-Discussion mailing list
More information about the NumPy-Discussion