This is the mail archive of the
mailing list for the GCC project.
Re: Offer of help with move to git
- From: "Eric S. Raymond" <esr at thyrsus dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Mon, 24 Aug 2015 12:31:57 -0400
- Subject: Re: Offer of help with move to git
- Authentication-results: sourceware.org; auth=none
- References: <20150823144340 dot GA7448 at thyrsus dot com> <alpine dot DEB dot 2 dot 10 dot 1508241350060 dot 19249 at digraph dot polyomino dot org dot uk>
- Reply-to: esr at thyrsus dot com
Joseph Myers <email@example.com>:
> Hence my suggestion in <https://gcc.gnu.org/ml/gcc/2015-08/msg00150.html>
> of reconverting and then combining with the existing git-svn history via
> renaming all the refs in the existing git repository, so as to preserve
> the validity of commit references and git-only branches there while having
> the main copy of the history properly converted.
Sorry, but I can't even imagine how to recombine in that way with the tools
I have. If you still think it's worth trying after seeing the reposurgeon
conversion I deliver, we can investigate that I suppose.
> I don't know what either git-svn or reposurgeon make of the times when
> trunk was accidentally deleted and then recreated as an SVN copy of a
> pre-deletion revision (what we want to avoid for the proper conversion is
> those looking like deletion and recreation of all files in trunk - commits
> that don't change the tree at all, or complete omission of the deletion
> and subsequent recreation, would be fine).
git-svn often fluffs that general kind of delete-recreate case
pretty badly; reposurgeon's analyzer takes them in stride. I have
a whole bunch of regression tests from pathological repos that I keep
around to verify this.
Another similar case is when a branch was created by a non-SVN copy
followed by a commit, losing ancestry information - this is a
relatively common operator error that reposurgeon had to learn to cope
with early on. Most other translation tools (including git-svn) lose
their cookies here.
Hairballs like these are why reposurgeon has its own internal parser for
the SVN dumpfile format, the only one that exists outside the SVN
suite itself and the exception to the general rule that reposurgeon
consumes the fast-import-stream output of exporters in order to
read repositories. I couldn't achieve robustness in the presence
of common metadata malformations in any less drastic way.
> It was converted from CVS. More precisely, from two CVS repositories: the
> gcc2 repository (1988-1999, starting as a collection of RCS files and with
> not many files version controlled before 1992 and documentation not
> version controlled for years after then), and what started as the EGCS
> repository (1997-2005). The two repositories were combined by a custom
> version of CVS (work done by Ian Taylor) to produce the input to cvs2svn.
> gcc2 changes between the start of EGCS in 1997 and 1999 when development
> in the gcc2 project ended were moved to /branches/premerge-fsf-branch as
> part of the combination process (pre-EGCS gcc2 changes are on trunk).
Uh oh. This sounds like it could be a recipe for serious grief.
While Ian is certainly smart and persistent enough to have made
something coherent out of that kind of mess, older versions of cvs2svn
were defect amplifiers that would turn even minor metadata glitches in
CVS into large tracts of scar tissue in the translated SVN, which in
turn tend not to get noticed until you try to up-convert from the
SVN. Cleaning up this kind of artifact was one of the major original
motivations for reposurgeon.
The fact that you had to *combine* CVS repositories hints that I
may be about to encounter an entirely new class of malformations.
Oh joy, oh rapture... :-(
> A few branches in the repository that started as the EGCS repository, the
> history of which branches was particularly messed up by rebasing (branch
> tags having been moved from one revision to another, leaving behind
> unnamed branches), were deliberately omitted from the conversion to SVN to
> avoid it generating large amounts of very messy and not particularly
> useful history in the resulting repository.
I'll be glad not to have those problems...
We'll know soon enough how bad things are. It's taken me the better
part of three days to mirror the SVN, in part because your hosting site
is randomly dropping connection once per several hours, but I'm now up
to 208213 which is 91% close to the end.
Once I have a complete mirror and can do a trial conversion, I'll be
able to run a 'lint' command that is pretty good at finding cvs2svn
I'll have to regenerate the empty contributor map, too. When I made the
first one I didn't know that mirroring had been interrupted by a host timeout;
I only had commits up to mid-2005.
The GCC repo is pretty huge, but I've been hunting mastodons like it
for years now - there's a row of trophy heads in the reposurgeon
documentation. I ended up building a machine with a processor and
cache specifically designed to handle non-parallelizable graph-theory
computations multiple gigabytes wide - SMP is no help here and you
want extra-large primary memory caches. On this hardware, conversion
runs will merely be painfully slow rather than die-of-old-age
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>