This is the mail archive of the
mailing list for the GCC project.
Re: Fixing cvs2svn branchpoints
- From: "Eric S. Raymond" <esr at thyrsus dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Fri, 1 Nov 2019 00:45:18 -0400
- Subject: Re: Fixing cvs2svn branchpoints
- References: <alpine.DEB.email@example.com> <alpine.DEB.firstname.lastname@example.org>
- Reply-to: esr at thyrsus dot com
Joseph Myers <email@example.com>:
> Here are complete lists of reparentings I think should be done on the
> commits that start branches, along with my notes on branches with messy
> initial commits but where I don't think any reparenting should be done.
> The REPARENT: lines have the meaning I described in
Please leave this as an issue on the gcc-conversion bugtracker.
Your timing is interesting. Happens I got my first full conversion
with the Go port of reposurgeon earlier today. I'm trying to verify
the conversion against the Subversion repository, but a full checkout
filled a filesystem on the EC2 instance I'm using. Recovery is
I'll do real benchmarks when I'm not staring at a deadline, but the
Go port is at least 20x faster than the Python was. That makes
the conversion practical, though it turns out the 128GB on my
desktop machine isn't enough to support it - hence the EC2 instance.
The first full conversion took eight hours. Turns out the single most
computationally expensive part of the surgery is data-mining ChangeLog
files for commit attributions. Today I threw massive parallelism at
the problem, that being something far easier to do in Go than in Python
- I think that might cut as much as two hours from the next run.
By going to the cloud I've gotten a larger working-set capacity at the
cost of some memory-access speed. Didn't want to do that, but
your repo is just too damn big for it to be otherwise, unless somebody
wants to drop cash on me to double the RAM in the Great Beast.
Your pile of requests is tricky but should be doable.
You had previously written:
>There are also cases where cvs2svn found a good branchpoint, but
>represented the branch-creation commit in a superfluously complicated
>way, replacing lots of files and subdirectories by copies of different
Yes, reposurgeon has logic to detect and deal with this automatically.
The assumption it makes is that the branch should root to the most
recent revision that CVS did a copy from. This is simple and seems to
give satisfactory results.
Which reminds me. I found a bunch of "svnmerge-integrated" properites
in the history. Should I treat those as though they were mergeinfo
properies and make branch merges from them?
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>