This is the mail archive of the
mailing list for the GCC project.
Re: source mgt. requirements solicitation
Phil Edwards <email@example.com> writes:
> On Sun, Dec 08, 2002 at 02:06:31PM -0800, Tom Lord wrote:
>> 1) There are frequent reports on this list of glitches with
>> the current CVS repository.
> IIRC, these have all been caused by non-CVS problems. (E.g., disks
> filled up, mail server getting hammed and DoS'ing the other
> services, etc.)
There is one situation that used to come up a lot which is CVS's
fault: a 'cvs server' process dies without removing its lock files,
wedging that directory for everyone else until it's manually removed.
I believe this has been dealt with by some patches to the server plus
a cron job that looks for stale locks; however, a version control
system that could not get into a wedged state like that would be useful.
>> 3) Judging by the messages on this list, there is some tension
>> between the release cycle and feature development -- some
>> issues around what is merged when, and around the impact of
> Yes. I don't see how the choice of revision control software makes a
> difference here. The limiting resource here is people-hours.
CVS makes working on branches quite difficult. I suspect that a
system that made it easier would mean that people were a bit more
comfortable about doing development on branches for long periods of
>> 4) GCC, more than many projects, makes use of a formal review
>> process for incoming patches.
This is a strength, but with a downside -- patches can and do get
lost. We advise people to re-send patches at intervals, but some
sort of automated patch-tracker would probably be helpful. I don't
think the version control system can help much here (but see below).
>> 5) Mark and the CodeSourcery crew seem to do a lot of fairly
>> mechanical work by hand to operate the release cycle.
> Perhaps you haven't looked at contrib/* and maintainer-scripts/* lately?
> Releases and weekly snapshots are all done with those.
I do a fair amount of by-hand work merging the trunk into the
basic-improvements-branch. Some, but not all, of that work could be
facilitated with a better version control system. See below.
>> 11) Some efforts, such as overhauling the build process, will
>> probably benefit from a switch to rev ctl. systems that
>> support tree rearrangements.
I have several changes in mind which I have not done largely because
CVS lacks the ability to version renames. To be specific: move cpplib
to the top level; move gcc/intl to the top level and sync it with the
version of that directory in the src repository; move the C front end
to a language subdirectory like the others; move the Ada runtime
library to the top level.
I'm not saying that I would definitely have done all of these changes
by now if we were using a version control system that handled renames;
only that the lack of rename support is a major barrier to them.
* * *
I'm now going to list the requirements which I would place on a
replacement for CVS, in rough decreasing order of importance. I
haven't done any research to back them up -- this is just off the top
of my head (but having thought about the issue quite a bit).
0. Must be at least as reliable and at least as portable as CVS. GCC
is a very large development effort. We can't afford to lose
contributors because their preferred platform is shut out, nor can
we afford to lose work due to bugs, and we *especially* cannot risk
a system which has not been audited for security exposures. It
would be relatively easy to give much stronger data integrity
guarantees than CVS currently manages:
0a. All data stored in the repository is under an end-to-end
checksum. All data transmitted over the network is independently
checksummed (yes, redundant with TCP-layer checksums). CVS does
no checksumming at all.
0b. Anonymous repository access is done under a user ID that has only
OS-level read privileges on the repository's files. This cannot
be done with (unpatched) CVS.
0c. Remote write operations on the repository intrinsically require
the use of a protocol which makes strong cryptographic integrity
and authority guarantees. CVS can be set up like this, but it's
not built into the design.
0d. The data stored in the repository cannot be modified by
unprivileged local users except by going through the version
control system. Presently I could take 'vi' to one of the ,v
files in /cvs/gcc and break it thoroughly, or sneak something into
the file content, and leave no trace.
1. Must be at least as fast as CVS for all operations, and should be
substantially faster for all operations where CVS uses a braindead
algorithm. I would venture to guess that everyone's #1 complaint
about CVS is the amount of time we waste waiting for it to complete
this or that request. To be more specific:
1a. Efficient network protocol. Specifically, a network protocol that,
for *all* operations, transmits a volume of data proportional --
with a small constant! -- to the size of the diff involved, *not*
the total size of all the files touched by the diff involved, as
1b. Efficient tags and branches. It should be possible to create
either by creating *one* metadata record, rather than touching
every single file in the repository.
1c. Efficient delta storage algorithm, such that checking in a change
on the tip of a branch is not orders of magnitude slower than
checking in a change on the tip of the trunk. There are several
sane ways to do this.
1d. Efficient method for extracting a logical change after the fact,
no matter how many files it touched. (Currently the easiest way
to do this is: hunt through the gcc-cvs archive until you find the
message describing the checkin you care about, then use wget on
all of the per-file diff URLs in the list and glue them all
together. Slow, painful, doesn't always work.)
2. Should support this laundry list of features, none of which is
known to CVS. Most of them would be useful independent of the
others, though there's not much point to 2b without 2a, nor 2e
2a. Atomic application of a logical change that touches many files,
possibly not all in the same directory. (This is commonly known as
a "change set".) One checkin log per change set is adequate.
2b. Ability to back out an entire change set just as atomically as it
2c. Ability to rename a file, including the ability for a file to have
different names on different branches.
2d. Automatically remember that a merge occurred from branch A to
branch B; later, when a second merge occurs from A to B, don't
apply those changes again.
2e. Understand the notion of a single-delta merge, either applying
just one change from branch A to branch B, or removing just one
change formerly on branch A ("subtractive merge").
2f. Perform conflict resolution by automatic formation of
3. Should allow a user without commit privileges to generate a change
set, making arbitrary changes to the repository (none of this "you
can edit files and generate diffs but you can't add or delete
files" nonsense), which can be applied by a user who does have
commit privileges, and when the original author does an update
he/she doesn't get spurious conflicts.
4. The repository's on-disk data should be stored in a highly compact
format, to the maximum extent possible and consonant with being
fast. Being fast is much more important; however, GCC's CVS
repository is ~800MB in size and compresses down to ~100MB. You
can do interesting things (like keep a copy of the entire
repository on every developer's personal hard disk, as Bitkeeper
does) with a 100MB repository that are not so practical when it's
closer to a gigabyte.
5. Should have the ability to generate ChangeLog files automagically
from the checkin comments. (When merging to basic-improvements I
normally spend more time fixing up the ChangeLogs than anything
else. Except maybe waiting for 'cvs tag' and 'cvs update -j...'.)