This is the mail archive of the
mailing list for the GCC project.
Re: That light at the end of the tunnel?
- From: "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>
- To: "Eric S. Raymond" <esr at thyrsus dot com>, GCC Development <gcc at gcc dot gnu dot org>, fallenpegasus at gmail dot com
- Date: Mon, 23 Jul 2018 14:20:37 +0100
- Subject: Re: That light at the end of the tunnel?
- References: <20180721020451.8E1C43A4AA7@snark.thyrsus.com>
On 21/07/18 03:04, Eric S. Raymond wrote:
> That light at the end of the tunnel turned out to be an oncoming train.
> Until recently I thought the conversion was near finished. I'd had
> verified clean conversions across trunk and all branches, except for
> one screwed-up branch that the management agreed we could discard.
> I had some minor issues left with execute-permission propagation and how
> to interpret mid-branch deletes I solved the former and was working
> on the latter. I expected to converge on a final result well before
> the end of the year, probably in August or September.
> Then, as I reported here, my most recent test conversion produced
> incorrect content on trunk. That's very bad, because the sheer size
> of the GCC repository makes bug forensics extremely slow. Just loading
> the SVN dump file for examination in reposurgeon takes 4.5 hours; full
> conversions are back up to 9 hours now. The repository is growing
> about as fast as my ability to find speed optimizations.
> Then it got worse. I backed up to a commit that I remembered as
> producing a clean conversion, and it didn't. This can only mean that
> the reposurgeon changes I've been making to handle weird branch-copy
> cases have been fighting each other.
> For those of you late to the party, interpreting the operation
> sequences in Subversion dump files is simple and produces results that
> are easy to verify - except near branch copy operations. The way those
> interact with each other and other operations is extremely murky.
> There is *a* correct semantics defined by what the Subversion code
> does. But if any of the Subversion devs ever fully understood it,
> they no longer do. The dump format was never documented by them. It is
> only partly documented now because I reverse-engineered it. But the
> document I wrote has questions in it that the Subversion devs can't
> It's not unusual for me to trip over a novel branch-copy-related
> weirdness while converting a repo. Normally the way I handle this is
> by performing a bisection procedure to pin down the bad commit. Then I:
> (1) Truncate the dump to the shortest leading segment that
> reproduces the problem.
> (2) Perform a strip operation that replaces all content blobs with
> unique small cookies that identify their source commit. Verify that it still
> (3) Perform a topological reduce that drops out all uninteresting
> commits, that is pure content changes not adjacent to any branch
> copies or property changes. Verify that it still reproduces...
> (4) Manually remove irrelevant branches with reposurgeon.
> Verify that it still reproduces...
> At this point I normally have a fairly small test repository (never,
> previously, more than 200 or so commits) that reproduces
> the issue. I watch conversions at increasing debug levels until I
> figure out what is going on. Then I fix it and the reduced dump
> becomes a new regression test.
> In this way I make monotonic progress towards a dumpfile analyzer
> that ever more closely emulates what the Subversion code is doing.
> It's not anything like easy, and gets less so as the edge cases I'm
> probing get more recondite. But until now it worked.
> The size of the GCC repository defeats this strategy. By back of the
> envelope calculation, a single full bisection would take a minimum of
> 18 days. Realistically it would probably be closer to a month.
So traditional git bisect is inherently serial, but we can be more
creative here, surely. A single run halves the search space each time.
But three machines working together can split it into 4 each run, 7
machines into 8, etc. You don't even need a precise 2^N - 1 to get a
It's not as efficient computationally as running on a single machine,
but it can be more efficient in terms of elapsed time.
We just need to find some way of divvying up the work and then machines
that are capable of running the job. They don't have to be 'big beasts,
just have enough ram and not be so puny that they overall hold up the
Think seti@home for git bisect....
Surely collectively we can solve this problem...
> That means that, under present assumptions, it's game over
> and we've lost. The GCC repo is just too large and weird.
> My tools need to get a lot faster, like more than an order of
> magnitude faster, before digging out of the bad situation the
> conversion is now in will be practical.
> Hardware improvements won't do that. Nobody knows how to build a
> machine that can crank a single process enough faster than 1.3GHz.
> And the problem doesn't parallelize.
> There is a software change that might do it. I have been thinking
> about translating reposurgeon from Python to Go. Preliminary
> experiments with a Go version of repocutter show that it has a
> 40x speed advantage over the Python version. I don't think I'll
> get quite that much speedup on reposurgeon, but I'm pretty
> optimistic agout getting enough speedup to make debugging the GCC
> conversion tractable. Even at half that, 9 hour test runs would
> collapse to 13 minutes.
> The problem with this plan is that a full move to Go will be very
> difficult. *Very* difficult. As in, work time in an unknown and
> possibly large number of months.
> GCC management will have to make a decision about how patient
> it is willing to be. I am at this point not sure it wouldn't be
> better to convert your existing tree state and go from there, jeeping
> the Subversion history around for archival purposes