This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: source mgt. requirements solicitation

From: Zack Weinberg <zack at codesourcery dot com>
To: Phil Edwards <phil at jaj dot com>
Cc: Tom Lord <lord at emf dot net>, dewar at gnat dot com, gcc at gnu dot org
Date: Sun, 08 Dec 2002 16:55:14 -0800
Subject: Re: source mgt. requirements solicitation
References: <20021208113711.8E3ECF2E46@nile.gnat.com><200212082206.OAA19308@emf.net><20021208184520.A29684@disaster.jaj.com>

Phil Edwards <phil@jaj.com> writes:

> On Sun, Dec 08, 2002 at 02:06:31PM -0800, Tom Lord wrote:
>> 	1) There are frequent reports on this list of glitches with
>> 	   the current CVS repository.
>
> IIRC, these have all been caused by non-CVS problems.  (E.g., disks
> filled up, mail server getting hammed and DoS'ing the other
> services, etc.)

There is one situation that used to come up a lot which is CVS's
fault: a 'cvs server' process dies without removing its lock files,
wedging that directory for everyone else until it's manually removed.
I believe this has been dealt with by some patches to the server plus
a cron job that looks for stale locks; however, a version control
system that could not get into a wedged state like that would be useful.

>> 	3) Judging by the messages on this list, there is some tension
>> 	   between the release cycle and feature development -- some
>> 	   issues around what is merged when, and around the impact of
>> 	   freezes.
>
> Yes.  I don't see how the choice of revision control software makes a
> difference here.  The limiting resource here is people-hours.

CVS makes working on branches quite difficult.  I suspect that a
system that made it easier would mean that people were a bit more
comfortable about doing development on branches for long periods of
time.

>> 	4) GCC, more than many projects, makes use of a formal review
>>            process for incoming patches.
> Yes.

This is a strength, but with a downside -- patches can and do get
lost.  We advise people to re-send patches at intervals, but some
sort of automated patch-tracker would probably be helpful.  I don't
think the version control system can help much here (but see below).

>> 	5) Mark and the CodeSourcery crew seem to do a lot of fairly
>> 	   mechanical work by hand to operate the release cycle.
>
> Perhaps you haven't looked at contrib/* and maintainer-scripts/* lately?
> Releases and weekly snapshots are all done with those.

I do a fair amount of by-hand work merging the trunk into the
basic-improvements-branch.  Some, but not all, of that work could be
facilitated with a better version control system.  See below.

>> 	11) Some efforts, such as overhauling the build process, will
>> 	    probably benefit from a switch to rev ctl. systems that
>> 	    support tree rearrangements.
>
> Probably.

I have several changes in mind which I have not done largely because
CVS lacks the ability to version renames.  To be specific: move cpplib
to the top level; move gcc/intl to the top level and sync it with the
version of that directory in the src repository; move the C front end
to a language subdirectory like the others; move the Ada runtime
library to the top level.

I'm not saying that I would definitely have done all of these changes
by now if we were using a version control system that handled renames;
only that the lack of rename support is a major barrier to them.

  * * *

I'm now going to list the requirements which I would place on a
replacement for CVS, in rough decreasing order of importance.  I
haven't done any research to back them up -- this is just off the top
of my head (but having thought about the issue quite a bit).

0. Must be at least as reliable and at least as portable as CVS.  GCC
   is a very large development effort.  We can't afford to lose
   contributors because their preferred platform is shut out, nor can
   we afford to lose work due to bugs, and we *especially* cannot risk
   a system which has not been audited for security exposures.  It
   would be relatively easy to give much stronger data integrity
   guarantees than CVS currently manages:

0a. All data stored in the repository is under an end-to-end
    checksum.  All data transmitted over the network is independently
    checksummed (yes, redundant with TCP-layer checksums).  CVS does
    no checksumming at all.

0b. Anonymous repository access is done under a user ID that has only
    OS-level read privileges on the repository's files.  This cannot
    be done with (unpatched) CVS.

0c. Remote write operations on the repository intrinsically require
    the use of a protocol which makes strong cryptographic integrity
    and authority guarantees.  CVS can be set up like this, but it's
    not built into the design.

0d. The data stored in the repository cannot be modified by
    unprivileged local users except by going through the version
    control system.  Presently I could take 'vi' to one of the ,v
    files in /cvs/gcc and break it thoroughly, or sneak something into
    the file content, and leave no trace.

1. Must be at least as fast as CVS for all operations, and should be
   substantially faster for all operations where CVS uses a braindead
   algorithm.  I would venture to guess that everyone's #1 complaint
   about CVS is the amount of time we waste waiting for it to complete
   this or that request.  To be more specific:

1a. Efficient network protocol.  Specifically, a network protocol that,
    for *all* operations, transmits a volume of data proportional --
    with a small constant! -- to the size of the diff involved, *not*
    the total size of all the files touched by the diff involved, as
    CVS does.

1b. Efficient tags and branches.  It should be possible to create
    either by creating *one* metadata record, rather than touching
    every single file in the repository.

1c. Efficient delta storage algorithm, such that checking in a change
    on the tip of a branch is not orders of magnitude slower than
    checking in a change on the tip of the trunk.  There are several
    sane ways to do this.

1d. Efficient method for extracting a logical change after the fact,
    no matter how many files it touched.  (Currently the easiest way
    to do this is: hunt through the gcc-cvs archive until you find the
    message describing the checkin you care about, then use wget on
    all of the per-file diff URLs in the list and glue them all
    together.  Slow, painful, doesn't always work.)

2. Should support this laundry list of features, none of which is
   known to CVS.  Most of them would be useful independent of the
   others, though there's not much point to 2b without 2a, nor 2e
   without 2d.

2a. Atomic application of a logical change that touches many files,
    possibly not all in the same directory. (This is commonly known as
    a "change set".)  One checkin log per change set is adequate.

2b. Ability to back out an entire change set just as atomically as it
    went in.

2c. Ability to rename a file, including the ability for a file to have
    different names on different branches.

2d. Automatically remember that a merge occurred from branch A to
    branch B; later, when a second merge occurs from A to B, don't
    apply those changes again.

2e. Understand the notion of a single-delta merge, either applying
    just one change from branch A to branch B, or removing just one
    change formerly on branch A ("subtractive merge").

2f. Perform conflict resolution by automatic formation of
    microbranches.

3. Should allow a user without commit privileges to generate a change
   set, making arbitrary changes to the repository (none of this "you
   can edit files and generate diffs but you can't add or delete
   files" nonsense), which can be applied by a user who does have
   commit privileges, and when the original author does an update
   he/she doesn't get spurious conflicts.

4. The repository's on-disk data should be stored in a highly compact
   format, to the maximum extent possible and consonant with being
   fast.  Being fast is much more important; however, GCC's CVS
   repository is ~800MB in size and compresses down to ~100MB.  You
   can do interesting things (like keep a copy of the entire
   repository on every developer's personal hard disk, as Bitkeeper
   does) with a 100MB repository that are not so practical when it's
   closer to a gigabyte.

5. Should have the ability to generate ChangeLog files automagically
   from the checkin comments.  (When merging to basic-improvements I
   normally spend more time fixing up the ChangeLogs than anything
   else.  Except maybe waiting for 'cvs tag' and 'cvs update -j...'.)

zw

Follow-Ups:
- Re: source mgt. requirements solicitation
  - From: Phil Edwards
- Re: source mgt. requirements solicitation
  - From: Joseph S. Myers

References:
- Re: on reputation and lines and putting things places (Re: gcc branches?)
  - From: Robert Dewar
- source mgt. requirements solicitation
  - From: Tom Lord
- Re: source mgt. requirements solicitation
  - From: Phil Edwards

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]