This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Repository for the conversion machinery

From: Joseph Myers <joseph at codesourcery dot com>
To: "Eric S. Raymond" <esr at thyrsus dot com>
Cc: Jonathan Wakely <jwakely dot gcc at gmail dot com>, "Frank Ch. Eigler" <fche at redhat dot com>, Jason Merrill <jason at redhat dot com>, gcc Mailing List <gcc at gcc dot gnu dot org>
Date: Mon, 10 Oct 2016 21:53:08 +0000
Subject: Re: Repository for the conversion machinery
Authentication-results: sourceware.org; auth=none
References: <alpine.DEB.2.20.1610061606450.737@digraph.polyomino.org.uk> <20161006171943.GA9036@thyrsus.com> <CADzB+2m788REAptOJjL1sYMOgHLCdJacFZFaL8=DRKM=vq1Q9w@mail.gmail.com> <alpine.DEB.2.20.1610062016110.21302@digraph.polyomino.org.uk> <CADzB+2=redKvY55hEMKoxHBpiAAFeNrcmNnZjCpLnGcnDee_9A@mail.gmail.com> <alpine.DEB.2.20.1610072056580.9580@digraph.polyomino.org.uk> <y0mint3yiff.fsf@fche.csb> <alpine.DEB.2.20.1610072111110.9580@digraph.polyomino.org.uk> <CAH6eHdSmmD8bUPz6p=vHojr-0u0YwPFpGPv-F+RpDim61uSj9g@mail.gmail.com> <alpine.DEB.2.20.1610101745210.25080@digraph.polyomino.org.uk> <20161010200843.GA16288@thyrsus.com>

On Mon, 10 Oct 2016, Eric S. Raymond wrote:

> I strongly recomend that if you want to try this, you separate it from the
> initial repo conversion.  That is, get the project to git first.  Then
> see if you can data-mine author information out of the history. If,
> and only if, you get results that look reasonable, then you patch the repo
> and force-push it, warning everyone there'll be a flag day.
> 
> The reason I recommend this is that I think you're going to have serious
> trouble getting clean authorship data with good coverage.  The data
> mining will be messy and take longer than you expect.

I also think it would be too messy, and don't think having such a flag day 
would be a good idea - once we've done the conversion we should keep 
commit ids stable (while having the commit objects from the existing git 
mirror in a disjoint set of branches not connected to the cleanly 
converted history, whether in a separate repository or not, so existing 
references to those commit ids continue to work as well - but I don't want 
to add a third set of commit ids for the same history as well).

In practice there are a lot of ways people have messed up ChangeLog 
commits or commit messages that I would expect to confuse such author 
extraction, even before you get to the parts of the history converted from 
CVS.

-- 
Joseph S. Myers
joseph@codesourcery.com

Follow-Ups:
- Re: Repository for the conversion machinery
  - From: Eric S. Raymond

References:
- Re: Repository for the conversion machinery
  - From: Joseph Myers
- Re: Repository for the conversion machinery
  - From: Eric S. Raymond
- Re: Repository for the conversion machinery
  - From: Jason Merrill
- Re: Repository for the conversion machinery
  - From: Joseph Myers
- Re: Repository for the conversion machinery
  - From: Jason Merrill
- Re: Repository for the conversion machinery
  - From: Joseph Myers
- Re: Repository for the conversion machinery
  - From: Frank Ch. Eigler
- Re: Repository for the conversion machinery
  - From: Joseph Myers
- Re: Repository for the conversion machinery
  - From: Jonathan Wakely
- Re: Repository for the conversion machinery
  - From: Joseph Myers
- Re: Repository for the conversion machinery
  - From: Eric S. Raymond

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]