[Numpy-discussion] Moving away from svn ?

David Cournapeau david@ar.media.kyoto-u.ac...
Fri Jan 4 05:24:04 CST 2008


    First things first, happy new year to all !

    Having recently felt the pain to use subversion merge, I was 
wondering about people's feeling on moving away from subversion and 
using a better system, ala mercurial or bzr (I will talk about bzr 
because that's the one I know the most, but this discussion is really 
about using something better than subversion, not that much about bzr). 
I think this could be an important step forward, and is somewhat related 
to the discusions on scikits and co.
    As some of you are certainly aware, there has been a recent trend 
towards so called Distributed Version Control Systems (DVCS). I won't go 
into the details, because it varies from system to system, and I am in 
no position to explain technical details. But for people who are 
wondering, here is a small description of DVCS, and why I think this can 
be a significant step forward for numpy/scipy. You can skip it if you 
know about them

What is a DVCS

     DVCS, contrary to centralized systems like CVS or SVN, has no 
technical concept of a central repository from which everybodoy pulls 
and push changes. Instead, the DVCS are centered around the branch 
concept, which contains a local copy of the history. As a consequence:

    1 you can do most traditionnal svn/cvs operations locally, and 
disconnected from the network (getting the log, getting annotations, 
commiting, branching, merging from other branches).
    2 because the branch is local, no rights is needed: anybody can 
jump-in, commit to a local branch. Of course, integration in an official 
numpy branch would need some special approval.

Also, this has the following consequence: since branching/merging is 
such a key point of DVCS, merging actually works with DVCS. In 
perticular, merging several times the same changes works, and you 
certainly do not have to do the whole svn madness of tracking versions.

For more informations, here are some links which go much deeper:
    - some discussion from K. Richard, the maintained of X.org: 
    - Linus Torvald on the advantages of git, the DVCS he wrote for 
linux developement, versus svn for kde (long but it really makes all the 
points really clearly): 
Why using a DVCS ?

Some people argue that DVCS are intrinsically more complicated, which is 
something I really don't understand. I've been programming 'seriously' 
for only about 2-3 years, and I find bzr much easier to use and setting 
up than subversion; the key point I think is that I started using DVCS 
before centralized ones. Some things which are utterly complicated with 
subversion and are trivial with bzr: merging, going back into the 
history (that is at rev 150, you realize that everything from rev 140 is 
rubbish, and you want to go back: this is extremely tedious to do with 
subversion). Basically, most of the things which are the reasons why we 
use VCS in the first place are easier with DVCS than VCS (at least as 
far as svn is concerned). Also:

- For a casual user who wants to use the last development instead of a 
release, getting it from a bzr repository, a git repository, a mercurial 
repository or a svn repository is extremely similar. It is one step in 
all cases.

- For casual developers: being able to use branches means that they can 
implements their new features in a change-set oriented way instead of 
one big patch. Also, bzr enables things like uncommit if you made a 
mistake and wants to go back. More generally, going back in history is 
much easier.

- For core developers: I personally find the ability to use branches for 
each new feature to be extremely useful. It makes me feel much safer 
when I do something. I am not afraid of doing something totally stupid 
which may end up screwing other people.

And finally, I find the ability to do things locally to be really 
pleasant and it enables workflows not really possible with systems such 
as SVN. In particular, I work at three distant places every week, and 
the ability to work in the transportation, and the trivial 
synchronization between computers is definitely helpful. Instantaneous 
log and annotations is also really useful IMHO.

Which DVCS ?

The 3 ones which keeps coming up are:
    - git (the one used for linux kernel development). That's the one I 
know the least (only from a user point of view, never used it for 
developement). It is supposed to be more powerful, more complicated than 
the others. It is also known to be really fast (the kernel is not a 
small codebase for sure).
    - mercurial: started at the same time than git. Is written in python 
except for a few things written in C. It is reasonably fast, and has 
been recently selected for some bigs projects, in perticular by Sun 
(openJDK, openSolaris, open Netbeans).
    - bzr: also written in python. Sponsored by Cannonical, the company 
between Ubuntu. It has just reached the 1.0 version. The focus is on the 
UI; handles renaming really well. It has a vibrant community, with 
dedicated developers working on it; it has the reputation of being slow, 
which was somewhat true previously, but in my experience, it is on par 
with mercurial, at least for local operations. Anyway, it is not a 
problem for numpy or scipy, which are small codebases (a few thousand of 
files, a few thousand revisions).


Assuming people think it worths being tried out, I mainly see two problems:
    - importing the current history
    - integration with trac

For bzr, I can say that the bzr-svn plugin works really well; in 
perticular, it can import numpy and scipy repositories with the whole 
history, I am using it regurlarly as a proxy between local bzr and the 
scipy and scikits trunk. Incidentally, this makes it possible for me to 
give numbers if numbers are needed wrt bzr's speed, repository size, etc...

For mercurial, I tried one method once which did not go really far, but 
I did not try really hard; anyway, I think people at enthought use 
mercurial a lot, so they would know better.

Integration with trac is the real problem, I think. According to one bzr 
developer, trac model (0.10, the last released one) is really based 
around subversion notion of repository, which does not fit well with 
mercurial and bzr. I don't know if this is true for the not yet released 
0.11. If bzr is considered a possible candidate, I can get more 
informations from bzr developers. What is the experience wrt trac from 
enthought developers ?

This email is already getting pretty long, so to conclude, I think DVCS 
would be helpful for future development of numpy/scipy. I believe it 
would both enable easier participation from different people, enabling 
safer developement schedules, etc... What do other people think ? Would 
it be worthwhile to discuss further around the issues and how to resolve 
things ?



P.S: I would be willing to take care about the bzr side of things: 
trying conversion, setting up experimental repositories for trial, and 
asking advices to the bzr community.

More information about the Numpy-discussion mailing list