This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Notes from the version control BOF at the summit


We held a version control BOF yesterday at the summit.  I promised to
send out the notes which I took, and here they are.  These represent
only my own opinions, not any sort of group consensus.  These only
include the ideas which I wrote down; I may have missed other items.
I do not personally have a strong opinion on these issues, and this is
mostly a report, not an argument.

gcc has been developed using CVS since the egcs project was started in
1997.  (Before that, gcc was developed using RCS.)  While CVS is
clearly able to support gcc development, it has several well-known
deficiencies, some of which are mentioned below.

There are a number of other version control options available today,
including cvsnt, subversion, monotone, arch, and darcs.  (Bitkeeper is
ruled out, as it is not free software.)  It is appropriate for us to
consider switching to a different version control system.
Accordingly, at the BOF we discussed what we would like to see in a
version control system for gcc.  Also, Graydon Hoare, the author of
monotone, has the opportunity to put some time into version control
development, and was looking for suggestions.

It's fairly clear what gcc requires in a version control system: it
requires those features of CVS which we use every day.  This is not
every feature of CVS--we don't use wrappers, for example--but it is
most of them.  It is less clear what features gcc could benefit from,
but here is a list proposed by people at the BOF.

* Must preserve existing history
    We can not discard our existing CVS revision information.  It was
    bad enough losing the older RCS information (it is available in
    the old-gcc directory in the CVS repository on gcc.gnu.org, but
    that is inconvenient).

* Must support exporting to CVS
    In case we change our minds.

* The tool must be around for a while, and continue to be supported
    We don't want to have to change again, at least not for a while.

* Must support a single blessed version
    CVS can not operate in any other fashion, but this is an issue for
    a decentralized system; more on this below.

* Atomic change sets
    The ability to refer to a complete checkin as a single unit.  CVS
    does not support this: if you check in several files
    simultaneously, CVS acts as though you made separate checkins to
    each file.

* Track rename
    CVS has no way to rename a file.  Removing and adding a file loses
    the revision history.

* Cheap branches
    In CVS, creating a branch is a slow operation.

* Cheap tagging
    Similarly, in CVS creating a tag is a slow operation.

* Remember last branch merge point
    When merging between branches, automatically remember the last
    point at which the branches were merged, in order to easily do an
    incremental merge.

* Repository cloning
    It would be nice to easily make a copy of the repository.

* Ability to work offline
    For example, the ability to do diffs and examine logs while not
    connected to the Internet.  Some systems support doing a commit
    while not connected; more on this below.

* Ability to know which branches a particular patch is on
    It is useful to know whether a specific change has been put on a
    particular branch, or conversely which branches are missing the
    change.

* Copy a patchset to a particular branch
    A mechanism to describe a patch set, which is one or more changes,
    and then to put that patchset onto a particular branch.  Naturally
    this may cause conflicts.  However, some mechanism better than
    diff and patch would be useful.

* Run on Unix, Windows, Mac
    CVS does this already, and we need it.

* Language support and character set conversion
    A simple case is appropriate line endings for text files.

* Support text and binary files
    Similarly, text and binary files need to be handled appropriately.
    CVS does this today.

* Text search through version history
    The ability to ask which patches changed a particular symbol.

* Ask who deleted a line
    We can find out who added a line using 'cvs annotate', but it's
    harder to ask who deleted a line.

* Be able to reliably backup the repository
    This is not simple with CVS.  Our backup system does not
    coordinate with the CVS locking, so we may do a backup in the
    middle of a commit.

* Scale large enough to handle the complete Cygnus/Red Hat development
  repository
    This is the largest single repository we know of, with many
    projects, in CVS, dating back to 1991.

* Make it easy to know the last known good state
    Provide an easy way to update a working directory to the last
    state which was known to work for a particular target

* Integrate with bugzilla and gcc-patches
    CVS currently integrates with bugzilla--when an appropriate marker
    is put into the commit message, the commit message is added to the
    PR.  Integration with gcc-patches would also be nice--provide an
    automatic link from the log entry to the archived e-mail message
    describing the patch.

* Cross-merge with other projects using the same version control
  system
    A particular need for libgcj, which would like to merge back and
    forth with Classpath (or something like that).

* Push button approval for patches, for maintainers to use
    It would be nice to provide a mechanism for maintainers to be able
    to just push a button to commit a patch, rather than directing the
    submitter to commit it.

* Easily revoke approval
    It would be nice to be able to revoke approval for a patch.  Today
    this is done by explicitly reverting the patch.


In general, version control systems break down into centralized
systems and decentralized systems.  CVS is a centralized system.  In a
decentralized system, such as monotone or arch, everybody using the
system has a complete copy of the repository.  While this might
initially seem an absurd requirement, in fact today's cheap disk
prices make it feasible.  For example, the complete gcc repository
requires on the order of 2 gigabytes.

A decentralized system makes many common operations faster.  It makes
it possible to work offline, including doing commits offline.

Offline commits brings us to the issue of commit serialization.  We
require a single blessed version.  This means that we must address the
issue of effectively simultaneous commits to a single file.  CVS
prevents simultaneous commits by enforcing serialization at the
server--the second commit is rejected with an "Up-to-date check
failed" error.  This is not an option for an offline commit.  (Note
that even for a decentralized system, we can choose to use enforced
serialization when not working offline, perhaps through an external
program).

Another way to describe the problem is to ask, in the case of an
effectively simultaneous commit, who does the merge?  In CVS, the
committer does the merge.  From the point of view of gcc, this is a
distributed solution to the problem, automatically pushing the
required effort back to the person who cares the most.

In the case of an offline commit, the master repository may receive
multiple effectively simultaneous commits.  These commits must be
serialized, or, to put it another way, they must be merged.  How will
this be done?

[ I apologize if I get some details wrong in the following. ]

In monotone, each branch can have multiple heads.  A merge operation
is used to do a three-way merge among the multiple heads.  This can
then lead to conflicts, which must be resolved, more or less similar
to how CVS works today.  Monotone supports signing particular
revisions.  The blessed version of gcc would be the version on the
appropriate branch most recently signed by an appropriate key or keys.
In other words, the maintainer does the merge, although this could be
partially automated.

In arch, each repository is a serialization point for changes.
Changes are not pushed from one repository to another, they are
pulled.  Thus somebody (or some automated process) at the master
repository would be responsible for pulling in revisions from other
repositories.  Here the committer does the merge, by running a process
on the master repository.

darcs is more or less like arch, I think.

In other words, with arch or darcs, you need to explicitly push
committed changes to the master repository.  With monotone, you need
to sync with the master repository; the sync process will
automatically transfer your commits, although those commits will not
thereby become blessed.

The main serialization problem is, of course, the ChangeLog file.  For
most of these systems, it should be possible to write an automatic
merge for the ChangeLog file which will do the right thing.


Now that I've written all that, I don't have any recommendations for
action.  But I would like to hear comments from other people.  Since
this could easily degenerate into a mild flamewar, please try to
provide specific reasons for any argument you make for staying with
CVS, for switching to a particular system (or not switching to it),
for adding features to some system.

Ian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]