This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Proposal for the transition timetable for the move to GIT

On Wed, 8 Jan 2020, Eric S. Raymond wrote:

> They use your feedback to find places where their comment-processing
> scripts could be improved; we've used it learn what additional
> oddities in ChangeLogs we need to be able to handle automatically.

I've used comparisons of authors in the two conversions - in cases where 
they get different human identities for the author, not just different 
email addresses or name variants - to identify cases for manual review, 
since ChangeLog parsing is the most subjective part of doing a conversion 
and cases where different heuristics produce different results indicate 
those worthy of manual review.

Apart from about 1600 with no changes to ChangeLog files but a ChangeLog 
entry in the commit message, which I reviewed mostly automatically to make 
sure I agreed with Maxim's author extraction with only limited manual 
checks on those that looked like suspect cases, that involved reviewing 
around 3000 commits manually; I've now completed that review.  Some of 
those are also subjective cases even after review (for example, where the 
commit involved one person backporting another person's patch).

In the set of around 1200 commits with both ChangeLog and non-ChangeLog 
files being changed, which did not look like backports, for example, I 
arrived at around 400 author improvements from this review (not all of 
them the same authors as in Maxim's conversion), while for around 800 
commits I concluded the reposurgeon author was preferable.  (The typical 
case where reposurgeon does better is where successive commits add new 
ChangeLog entries under an existing ChangeLog header.  The typical case 
where I added fixes was where a commit made nonsubstantive changes under 
an existing header, as well as adding new entries, which is hard to 
distinguish automatically from a multi-author commit so reposurgeon 
conservatively treats as a multi-author commit.)

In the case of ChangeLog-only commits, where reposurgeon assumes they are 
likely to be fixing typos or similar and so does not extract an 
attribution from ChangeLog files in such commits, manual review identified 
many cases (especially in the earlier parts of the history) where the 
ChangeLog was committed separately from the substantive parts of the patch 
and so a better attribution could be assigned to those substantive 

I consider the reposurgeon-based conversion machinery to be in essentially 
its final state now; I don't have any further authors to review, Richard 
doesn't have any further Bugzilla-based commit summaries to review and we 
don't know of any relevant reposurgeon bugs or missing features.  I'm 
running a conversion now to verify both the current state of the fixups 
and the Makefile integration of the conversion and subsequent automated 
validation, and will make that converted repository available for final 
checks if this succeeds.  Compared to the previous converted repository, 
this one has many author fixups, a fix for a bug in the author fixups 
where they broke commit dates, and reposurgeon improvements to avoid 
producing unidiomatic empty git commits in the converted repository for 
things such as branch and tag creation.

This converted repository uses the ref rearrangements along the lines 
proposed by Richard (so dead branches and vendor branches are available 
but not fetched by default); the objects from the existing git mirror will 
also be included in the repository (so existing gitweb links to such 
objects in list archives continue to work, for example, as long as they 
aren't links to objects that were made unreachable at some point in the 
mirror's history), but again under ref names that are not fetched by 

As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
and change the SVN hooks to make SVN readonly, then disable gccadmin's 
cron jobs that build snapshots and update online documentation until they 
are ready to run with the git repository.  Once the existing git mirror 
has picked up the last changes I'll make that read-only and disable that 
cron job as well, and start the conversion process with a view to having 
the converted repository in place this weekend (it could either be made 
writable as soon as I think it's ready, or left read-only until people 
have had time to do any final checks on Monday).  Before then, I'll work 
on hooks, documentation and maintainer-scripts updates.

As well as having objects from the existing git mirror available under 
refs that are not fetched by default, that mirror will remain available 
read-only at git:// (which already exists, 
currently a symlink to the mirror).

Joseph S. Myers

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]