This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Acceptance criteria for the git conversion
- From: Joseph Myers <joseph at codesourcery dot com>
- To: "Eric S. Raymond" <esr at thyrsus dot com>
- Cc: Richard Earnshaw <Richard dot Earnshaw at foss dot arm dot com>, Mikhail Maltsev <maltsevm at gmail dot com>, <gcc at gcc dot gnu dot org>
- Date: Tue, 1 Sep 2015 18:03:23 +0000
- Subject: Re: Acceptance criteria for the git conversion
- Authentication-results: sourceware.org; auth=none
- References: <20150901105414 dot GA30270 at thyrsus dot com> <55E5B5A2 dot 7070509 at gmail dot com> <55E5B934 dot 1050307 at foss dot arm dot com> <alpine dot DEB dot 2 dot 10 dot 1509011703550 dot 11400 at digraph dot polyomino dot org dot uk> <20150901173356 dot GC3419 at thyrsus dot com>
On Tue, 1 Sep 2015, Eric S. Raymond wrote:
> Joseph Myers <joseph@codesourcery.com>:
> > Indeed. Ideally the tree objects in the git conversion should have
> > exactly the same contents as SVN commits, and so be shared with the
> > git-svn history to reduce the eventual repository size (except where there
> > are defects in the git-svn history, or the git conversion fixes up cvs2svn
> > artifacts and so some old revisions end up more accurately reflecting old
> > history than the SVN repository does).
>
> I don't think sharing with the git-svn history will be possible. git-svn
> is a terrible whole-history converter; the odds of getting the same
> topology out of reposurgeon are basically nil, and the problem of matching
> different topologies is quite hard.
I'm not proposing sharing topology (commit objects). Only blob and tree
objects. If two files have the same hash they will share the same blob
object, and if two trees have files with the same hashes at the same paths
then the tree objects will also have the same hash, and will be shared.
Now, git-svn may well have made mistakes meaning some trees in the git-svn
repository do not accurately correspond to any SVN revision of any branch
(and so the objects aren't shared), but I'd expect most to be shared (even
without disabling smart ignore handling, lots of tree objects for
subdirectories would be shared, if those subdirectories don't have any
ignore files or svn:ignore properties).
The point is that since the git-svn repository has been in use for years,
and there are many git-only branches there with lots of development on
them, there are also many git commit references in list archives etc.
which need to remain meaningful. While it would be possible to move the
existing repository to a different URI (or put the new repository at a
less-obvious URI), it seems simpler to put both sets of objects (with many
objects in common) in the same repository (with appropriately renamed refs
from the git-svn repository so that the objects aren't garbage-collected).
This isn't something for reposurgeon to do. It's something that should be
easy to do at the pure git level. At a minimum, I think it might be just
one command to add the git-svn objects to a repository converted with
reposurgeon. Untested, but should give an idea of what I'm thinking of:
git fetch git://gcc.gnu.org/git/gcc.git \
'refs/heads/*:refs/heads/git-old/*' \
'refs/remotes/*:refs/heads/git-svn-old/*' \
'regs/tags/*:refs/tags/git-old/*'
(OK, you want to git gc afterwards to repack the whole repository.)
--
Joseph S. Myers
joseph@codesourcery.com