[SciPy-dev] The future of SciPy and its development infrastructure
Mon Feb 23 10:31:31 CST 2009
Charles R Harris wrote:
> On Mon, Feb 23, 2009 at 9:04 AM, Stéfan van der Walt <email@example.com
> <mailto:firstname.lastname@example.org>> wrote:
> *[If you only have 30 seconds to read this email, read the **bold
> text only]*
> *Dear* SciPy *developer*s
> The past while has seen a rocky ride with the SciPy servers, but
> yesterday Peter Wang announced that he is attending to the
> situation. This, then, seems like the perfect time to *stand back
> and take a look at our infrastructure*, and whether we should
> continue with the current setup.
> To put this conversation into context, we have to face the facts:
> SciPy has a large user community relative to the number of
> developers. A big library of code, used by many scientists, is
> supported by a small handful of people all over the world. *We
> cannot afford* *a high barrier to contribution*, and we have to
> lower the effort it takes for a developer to merge contributed code.
> *I'd like to propose two changes* to the status quo:
> 1. *Change to a distributed revision control system*, encouraging
> more open collaboration.
> 2. *Determine guidelines for code acceptance*, in terms of unit
> tests, documentation and peer review.
> Allow me to motivate these changes, and then suggest practical
> approaches for their implementation:
> Subversion allows only a selected group of developers to change
> the SciPy source code. This does not encourage a culture of
> meritocracy, but worse, has practical implications, in that users
> cannot merge their own patches. I won't discuss the advantages of
> distributed revision control here, but note that it shifts
> responsibility from the current core developers to contributers;
> *that benefits us all!*
> This ties in with my second point: code review. The current
> developers have access to SVN because they are experienced
> programmers with knowledge of SciPy's scientific domains of
> application. We are unable to employ this scarce resource fully,
> because it simply takes too long to merge a patch from Trac,
> review it, *bring it up to scratch*, and commit it. *We have to
> put a system in place which allows contributers to take
> responsibility for their own patches, and for core developers to
> guide and advise during this process.* As it is, we have many
> patches waiting on Trac for up to a year or more without any
> feedback; that is not acceptable.
> My view on testing is simple: *untested code is probably broken
> code* (and I can show examples from the past year's commit logs to
> corroborate this statement). *As for documentation, we cannot
> afford to be without it.
> Enthought generously hosts SciPy, and I hope they will continue
> doing so. New software will need to be installed on the server,
> but we have many hands willing to tackle that task: David
> Cournapeau and myself included. Before deploying to scipy.org
> <http://scipy.org>, *we will configure a *different* server as a
> proof of concept.*
> 1) *Distributed revision control system: David Cournapeau and
> myself have been test driving Git  on SciPy and NumPy for a
> while. It is fast, well supported, has great branch support, and
> is simple to use for the average contributor, while allowing
> powerful patch-carving for the more adventurous.*
> I really like Git, but... the last time I looked windows support
> wasn't up to snuff. Does anyone have more recent feedback on the
> windows situation?
It is not ideal: it is based on a bash shell. But it does not require
cygwin anymore - you can grab an exe, and get it installed on your
machine for e.g. cloning and submitting a patch to the bug tracker. If
you want GUI, it won't work (but no DVCS has a decent GUI: TortoiseBZR
and TortoiseHG are really far behind what I would expect from a
reasonable GUI on windows). I think git will never be on par compared to
other tools, because git is fundamentally engrained into the unix
mentality (set of tools who communicate together through text).
But after having used bzr for > 2 years, I am entirely convinced that
git is far ahead bzr or even hg (I don't know much hg - I looked at it
at some point because it had the best svn support, but I have not
followed it recently - I still closely follow bzr development). One
thing about git is that the speed factor is too much emphasized IMHO -
even if git was as slow as bzr, I would prefer git today.
My main worries about git usage for numpy/scipy are related to the bug
tracker; tracking branches is more of a problem than I initially
thought. Github has some very nice concepts, but still none of the git
hosting projects can for example display the history graph, which is
very helpful for newcomers I think (the tools exists locally, though).
More information about the Scipy-dev