[Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

Segher Boessenkool segher@kernel.crashing.org
Sun Jun 2 23:13:00 GMT 2019


On Fri, May 31, 2019 at 12:05:41AM +0000, Joseph Myers wrote:
> On Wed, 29 May 2019, Segher Boessenkool wrote:
> 
> > On Wed, May 29, 2019 at 12:53:30AM +0000, Joseph Myers wrote:
> > > On Fri, 24 May 2019, Segher Boessenkool wrote:
> > > 
> > > > IMO the best we can do is use what we already have: what CVS or SVN used
> > > > as the committer identity.  *That* info is *correct* at least.
> > > 
> > > CVS and SVN have a local identity.  git has a global identity.  I consider 
> > 
> > Git has an identity (well, two) _per commit_, and there is no way you can
> > reconstruct people's prefered name and email address (at any point in time,
> > for every commit separately) correctly.  IMO it is much better to not even
> > try.  We already *have* enough info for anyone to trivially look up who wrote
> > what, and what might be that person's email address at the time.  But
> > pretending that is more than a guess is just wrong.
> 
> I think not doing a best-effort identification (name+email) is just as 

And I think guessing is not a "best effort", but just wrong.

> wrong as converting a CVS repository to a changeset-based system without 
> doing a best-effort unification of commits to different files around the 
> same time with the same log message into changesets.  Both are the same 

These are not similar situations at all.  Converting something to an SVN-
like data model is necessary for the resulting repo to work acceptably;
guessing person's names and email addresses is just nice-to-have in the
best case, and misleading in other cases.

> sort of heuristic conversion of data to the form idiomatic for a different 
> version control system based around different concepts.  Neither is 

It's single short line of text in SVN.  It is a single short line of text
in Git.  Both just show who wrote a patch, or who committed it.

Good luck finding out who was the primary author of every commit, btw.

> perfect, but the most useful conversion tries to combine CVS commits to 
> different files into changesets, and the most useful conversion tries to 
> identify authors in the way idiomatic for git using the information we 
> have about what person (globally) a given username on a given system 
> corresponds to.

We don't have that information.  This information can change over time,
and we never did track people's email addresses properly either.


Segher



More information about the Gcc-patches mailing list