This is the mail archive of the
mailing list for the GCC project.
Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git
- From: Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: Jason Merrill <jason at redhat dot com>, Paolo Bonzini <pbonzini at redhat dot com>
- Date: Tue, 16 Jul 2019 13:17:57 +0300
- Subject: Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git
- References: <E8A06A10-5BBC-4C2F-9C09-D5413B98D2DC@linaro.org> <8C62F814-2F57-4D1A-B66F-5C5ACFF37D6C@linaro.org>
I've been swamped with other projects for most of June, which gave me time to digest all the feedback I've got on GCC's conversion from SVN to Git.
The scripts have heavily evolved from the initial version posted here. They have become fairly generic in that they have no implied knowledge about GCC's repo structure. Due to this I no longer plan to merge them into GCC tree, but rather publish as a separate project on github. For now, you can track the current [hairy] version at https://review.linaro.org/c/toolchain/gcc/+/31416 .
The initial version of scripts used heuristics to construct branch tree, which turned out to be error-prone. The current version parse entire history of SVN repo to detect all trees that start at /trunk@1. Therefore all branches in the converted repo converge to the same parent at the beginning of their histories.
As far as GCC conversion goes, below is what I plan to do and what not to do. This is based on comments from everyone in this thread:
1. Construct GCC's git repo from SVN using same settings as current git mirror.
2. Compare the resulting git repo with current GCC mirror -- they should match on the commit hash level for trunk, branches/gcc-*-branch, and other "normal" branches.
3. Investigate any differences between converted GCC repo and current GCC mirror. These can be due to bugs in git-svn or other misconfigurations.
4. Import git-only branches from current GCC mirror.
5. Publish this "raw" repo for community to sanity-check its contents.
6. Re-write history of all branches -- converted from svn and git-only -- see note below [*].
7. Publish this "pretty" repo for community to sanity-check its contents.
8. Update both "raw" and "pretty" repos daily with new commits
9. Fix problems in the "raw" and "pretty" repos as they reported by the community.
Once these steps are done, the community could switch from SVN to git by disabling commits to SVN, waiting for final history to be absorbed by the "pretty" repo, and deploying the git repo as the official repo.
[*] Note on branch re-writing:
During svn->git conversion we have an opportunity to correct some of the artifacts of current git mirror:
a. Author and committer entries. These are difficult to get right during git-svn import process because the tool gives only SVN committer ID without much else. We could do much better by matching SVN committer ID with person's name in the map file, and then searching for person's current-at-the-time email address in the commit diff. I.e., mkuvyrkov -> Maxim Kuvyrkov -> [changelog from 2010's commit] -> firstname.lastname@example.org .
b. Re-write tags/ branches into annotated tags. Note that tags/* are included into history of several branches via merge or copy commits, so we would need to re-write history to have proper references to annotated tag commits in the histories of such branches.
c. Since we are re-writing history anyway, it would be nice to convert "svn-git: svn+ssh://" tags to "svn-git: https://". We are sure to retain publicly-visible svn repo accessible via https://, but not as likely to retain svn+ssh:// interface.
Which of these will make into the final repo is for community to decide.
> On May 28, 2019, at 1:31 PM, Maxim Kuvyrkov <email@example.com> wrote:
> Hi Everyone,
> What can I say, I was too optimistic about how easy it would be to convert GCC's svn repo to git one branch at a time. After 2 more weeks and several re-writes of the scripts I now know more about GCC's svn history than I would ever wanted.
> The prize for most complicated branch history goes to /branches/ibm/* . It has merges, it has re-creation branches from /trunk and even an accidental deletion of all of IBM's branches.
> The version of scripts I'm testing right now seems to deal with all of that.
> Also, to avoid controversy -- I'm working on these scripts to satisfy my own curiosity, and to give GCC community another option to choose from for the final migration. If by end of Summer 2019 we have 2-3 git repos to choose from, then we are likely to push GCC [kicking and screaming] into 2010's by the end of this decade.
> Maxim Kuvyrkov
>> On May 14, 2019, at 7:11 PM, Maxim Kuvyrkov <firstname.lastname@example.org> wrote:
>> This patch adds scripts to contrib/ to migrate full history of GCC's subversion repository to git. My hope is that these scripts will finally allow GCC project to migrate to Git.
>> The result of the conversion is at https://github.com/maxim-kuvyrkov/gcc/branches/all . Branches with "@rev" suffixes represent branch points. The conversion is still running, so not all branches may appear right away.
>> The scripts are not specific to GCC repo and are usable for other projects. In particular, they should be able to convert downstream GCC svn repos.
>> The scripts convert svn history branch by branch. They rely on git-svn on convert individual branches. Git-svn is a good tool for converting individual branches. It is, however, either very slow at converting the entire GCC repo, or goes into infinite loop.
>> There are 3 scripts:
>> - svn-git-repo.sh: top level script to convert entire repo or a part of it (e.g., branches/),
>> - svn-list-branches.sh: helper script to output branches and their parents in bottom-up order,
>> - svn-git-branch.sh: helper script to convert a single branch.
>> Whenever possible, svn-git-branch.sh uses existing git branches as caches.
>> What are your questions and comments?
>> The attached is cleaned up version, which hasn't been fully tested yet; typos and other silly mistakes are likely. OK to commit after testing?
>> Maxim Kuvyrkov