This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Proposal for the transition timetable for the move to GIT
- From: Joseph Myers <joseph at codesourcery dot com>
- To: "Eric S. Raymond" <esr at thyrsus dot com>
- Cc: Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>, "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>, GCC Development <gcc at gcc dot gnu dot org>, Alexandre Oliva <oliva at gnu dot org>, Jeff Law <law at redhat dot com>, Segher Boessenkool <segher at kernel dot crashing dot org>, Mark Wielaard <mark at klomp dot org>, Jakub Jelinek <jakub at redhat dot com>
- Date: Wed, 8 Jan 2020 23:34:32 +0000
- Subject: Re: Proposal for the transition timetable for the move to GIT
- Ironport-sdr: lWo2Xlf6u1wDWwXIm2pWpbdaPF6w+zUmpEXY+W/toqzztAhacnp46klu2aWUL/UG1ggM2e2/kr hsZiFeg1/OMnh2VwPBu5WtPbBm7Jlv3rjy2OVHgj0asB2YSjE6oUQIfoVwpHhKmBmjnKqAFa4I eeteOagwkvLA2b8a/y50Cfh5wk6TImLqY6G0QafIAleRG+vTfzqo7YKRa8nMMNq0Vb5/FURoNW rE9EMuWwCbv9AhBsQ3Fsh2rexryD9qHFill7pj8mutqHxxpufRFEliu3Wn2/hXQGDnl1vzgoW3 Bnc=
- Ironport-sdr: fAUcP99+qIeGh/z9OlwpFoDPIUhpV08pHnidMkmcrrWnt08FVML64JMPfx/SHcXrQslCvJZ4cX IUB0ObNgjTJblrHwQUbCQrvhrwgIiR8Vq2CmaU8K8yCnXHPuVmPd+U0/uNUpGXC2yjwKnvMpQ9 2R1L4yueVfdH7SwtLSFpK8+0KH9YeE1TonWXY/m8JnBnr+PYijjgC8buEY764vdxXA7clPUZt5 3yh55q/LsoHSpVhoO2UirxZJlRznV4fE94/t8iqgMwBpUw63UpfouM0fsxvpkN/oXZmbmuYwzk idc=
- References: <alpine.DEB.2.21.1912261054320.27097@digraph.polyomino.org.uk> <20191226111633.GJ10088@tucnak> <5DCEA32B-3E36-4400-B931-9F4E2A8F3FA5@linaro.org> <155B5BFD-6ECF-4EBF-A38C-D6DD178FB497@linaro.org> <2b6330f2-1a00-ac89-fd3c-4b70e5454f4b@arm.com> <9B71A0F7-CD93-4636-BFC7-1D1DBC040F07@linaro.org> <ba0755ff-496c-4d7b-268c-b0d5dab1aa9e@arm.com> <6EE7BD53-6677-49D2-BCDD-56CD7DA855E9@linaro.org> <f767537a-4a9a-4116-9832-fe83cfb16914@arm.com> <88B4DAF3-33C1-445F-8F5A-809D5463D0F9@linaro.org> <20200108221119.GA94728@thyrsus.com>
On Wed, 8 Jan 2020, Eric S. Raymond wrote:
> They use your feedback to find places where their comment-processing
> scripts could be improved; we've used it learn what additional
> oddities in ChangeLogs we need to be able to handle automatically.
I've used comparisons of authors in the two conversions - in cases where
they get different human identities for the author, not just different
email addresses or name variants - to identify cases for manual review,
since ChangeLog parsing is the most subjective part of doing a conversion
and cases where different heuristics produce different results indicate
those worthy of manual review.
Apart from about 1600 with no changes to ChangeLog files but a ChangeLog
entry in the commit message, which I reviewed mostly automatically to make
sure I agreed with Maxim's author extraction with only limited manual
checks on those that looked like suspect cases, that involved reviewing
around 3000 commits manually; I've now completed that review. Some of
those are also subjective cases even after review (for example, where the
commit involved one person backporting another person's patch).
In the set of around 1200 commits with both ChangeLog and non-ChangeLog
files being changed, which did not look like backports, for example, I
arrived at around 400 author improvements from this review (not all of
them the same authors as in Maxim's conversion), while for around 800
commits I concluded the reposurgeon author was preferable. (The typical
case where reposurgeon does better is where successive commits add new
ChangeLog entries under an existing ChangeLog header. The typical case
where I added fixes was where a commit made nonsubstantive changes under
an existing header, as well as adding new entries, which is hard to
distinguish automatically from a multi-author commit so reposurgeon
conservatively treats as a multi-author commit.)
In the case of ChangeLog-only commits, where reposurgeon assumes they are
likely to be fixing typos or similar and so does not extract an
attribution from ChangeLog files in such commits, manual review identified
many cases (especially in the earlier parts of the history) where the
ChangeLog was committed separately from the substantive parts of the patch
and so a better attribution could be assigned to those substantive
commits.
I consider the reposurgeon-based conversion machinery to be in essentially
its final state now; I don't have any further authors to review, Richard
doesn't have any further Bugzilla-based commit summaries to review and we
don't know of any relevant reposurgeon bugs or missing features. I'm
running a conversion now to verify both the current state of the fixups
and the Makefile integration of the conversion and subsequent automated
validation, and will make that converted repository available for final
checks if this succeeds. Compared to the previous converted repository,
this one has many author fixups, a fix for a bug in the author fixups
where they broke commit dates, and reposurgeon improvements to avoid
producing unidiomatic empty git commits in the converted repository for
things such as branch and tag creation.
This converted repository uses the ref rearrangements along the lines
proposed by Richard (so dead branches and vendor branches are available
but not fetched by default); the objects from the existing git mirror will
also be included in the repository (so existing gitweb links to such
objects in list archives continue to work, for example, as long as they
aren't links to objects that were made unreachable at some point in the
mirror's history), but again under ref names that are not fetched by
default.
As noted on overseers, once Saturday's DATESTAMP update has run at 00:16
UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk
and change the SVN hooks to make SVN readonly, then disable gccadmin's
cron jobs that build snapshots and update online documentation until they
are ready to run with the git repository. Once the existing git mirror
has picked up the last changes I'll make that read-only and disable that
cron job as well, and start the conversion process with a view to having
the converted repository in place this weekend (it could either be made
writable as soon as I think it's ready, or left read-only until people
have had time to do any final checks on Monday). Before then, I'll work
on hooks, documentation and maintainer-scripts updates.
As well as having objects from the existing git mirror available under
refs that are not fetched by default, that mirror will remain available
read-only at git://gcc.gnu.org/git/gcc-old.git (which already exists,
currently a symlink to the mirror).
--
Joseph S. Myers
joseph@codesourcery.com