Proposal for the transition timetable for the move to GIT
Mon Dec 9 20:45:00 GMT 2019
On Mon, 9 Dec 2019, Bernd Schmidt wrote:
> On 12/9/19 7:19 PM, Joseph Myers wrote:
> > For any conversion we're clearly going to need to run various validation
> > (comparing properties of the converted repository, such as contents at
> > branch tips, with expected values of those properties based on the SVN
> > repository) and fix issues shown up by that validation. reposurgeon has
> > its own tools for such validation; I also intend to write some validation
> > scripts myself.
> Would it be feasible to require that both conversions produce the same output
> repository to some degree? Can we just look at release tags and require that
> they have the same hash in both conversions, or are there good reasons why the
> two would produce different outputs?
The same hashes are not practical. There are several areas where two
perfectly correct conversions are still expected to have different
contents because of subjective decisions and heuristics involved in the
If some alternative heuristic is found to be clearly better than an
existing one in reposurgeon, so that it would be better for any project
converting with reposurgeon, or if some preference in the GCC case can
readily be represented as a configuration option to choose between
different approaches, it makes sense to implement the improvements in
reposurgeon so that any project with similar issues can benefit. For
example, see Richard's suggestions in reposurgeon issue 174 of two
possible improvements to ChangeLog handling: disregarding ChangeLog data
if a commit adds multiple ChangeLog entries by different authors, and
specifing a wildcard to allow ChangeLog processing on ChangeLog* files to
cover ChangeLog.<branch>. GCC is hardly the last project converting from
SVN to git, so we can benefit from the experiences of past conversions,
and help contribute to having useful features available for future
Here are some cases for differences between two correct conversions:
* Tree contents should mostly be identical at any given commit, but
reposurgeon deliberately produces a .gitignore with contents based on
svn:ignore if the SVN tree contents don't have a .gitignore (we use
--user-ignores to prefer the .gitignore file in SVN if it exists), and
removes any .cvsignore file.
* The first parent of a commit should typically be the same between
conversions, but (a) might be corrected in some way for cvs2svn issues,
(b) might skip SVN commits that would translate into empty git commits,
depending on the choices made for handling of such commits.
* Cases that give rise to no tree changes in a commit (which thus might
not become a git commit at all depending on the choices made and whether
they also don't change any merge information properties) include (a)
branch or tag creation as an exact copy of some revision of some branch,
(b) branch recreation as a copy, e.g. when trunk was deleted accidentally,
(c) commits that in SVN only add or remove empty directories, as git does
not store empty directories, (d) commits that in SVN just remove some file
or directory and replace it with a copy from some revision of some branch
that happens to have identical contents to the file or directory removed
(yes, we do have commits like that in GCC SVN).
* Subsequent parents of a commit based on merge info handling may well
have subjective differences between correct conversions.
* Commit messages might differ, both because of heuristics to improve
them, like Richard's work on that, and because of different choices for
how to represent the SVN revision number information in commit messages.
* Author and committer identifications, and commit timestamps (especially
timezones, something git has, SVN doesn't and reposurgeon has a per-author
map for) may vary because of different heuristics or author maps used,
especially when there is no ChangeLog entry for a commit or the ChangeLog
entry is in some way malformed or the commit adds ChangeLog entries for
multiple changes with different authors.
Joseph S. Myers
More information about the Gcc