This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Test GCC conversion with reposurgeon available


First of all thanks to everyone who spent time making the conversion better and better. Here is my 2c, I have studied a little my colleagues trunk history in Maxim's gcc-pretty vs gcc-reposurgeon-5b.

1) In gcc-pretty timezone info is lost in both author/commiter date (keeping UTC time correct, certainly). Examples are r278990 and r289989. Probably git-svn causes this, current read-only git mirror is also without timezone. Not sure we need that info, but reposurgeon is more correct here.

2) Some thoughts about script for summarizing commit log messages:
2a) Why r143753 and r150680 not have "re PR..." summary instead of "[multiple changes]" ? 2b) On the contrary r155892 have to mention two PRs, even "[multiple changes]" is better here, IMHO. 2c) In r130050 and r155902 we have "Rename too ... " in summary, not sure how to make it better. 2d) r146882 can have better summary if we somehow organize ChangeLog priority (gcc/ChangeLog is more important that testsuite one).

3) About author emails, see below
24.12.2019 21:14, Segher Boessenkool wrote:
On Tue, Dec 24, 2019 at 05:16:54PM +0000, Joseph Myers wrote:
On Tue, 24 Dec 2019, Segher Boessenkool wrote:
That's because that commit also edits ChangeLog entries from other
authors.  When a commit adds / edits ChangeLog entries for more than one
author (the difference between purely editing an existing entry and adding
a new one, possibly under an existing date/author header, for a
multi-author commit, is not something that can reliably be determined
automatically), the conversion falls back to using the committer identity
instead of picking one of the multiple relevant authors from the ChangeLog
files.
There is only one relevant author in r270511.  It edits a few wrong path
names in the previous changelog entries.  People often do similar things
(like fixing the commit date :-) )
Distinguishing "edits a previous ChangeLog entry" from "adds a new entry
under a previous ChangeLog header for a change included in the commit" is
a human judgement.
We are doing only one conversion here, the one of the GCC repo.  The
heuristic works, we checked it did.

Either never use <account>@gcc.gnu.org, or always use it, don't do the
worst of both worlds?
The heuristics here are to use an attribution from ChangeLog for the
author where unambiguous, but to use the committer (always @gcc.gnu.org /
@gnu.org [*], so avoiding attributions at the wrong company even where
people were using multiple addresses simultaneously for different changes)
as author if in doubt.
You never need that, and it is worse to use two different schemes than to
choose either.

I would have chosen the "<account>@gcc.gnu.org" scheme, because it is
simple and *correct*.  Other people wanted the nicer names.  Maxim's
conversion gets that correct.  Please copy it.

IMHO Segher is a bit categorical is the discussion, but I'll be glad to see brief description of Maxim's approach to manage emails, gcc-pretty shows better results. Speaking about the script counting authors from ChangeLog files, even if we drop an "edits a previous ChangeLog entry" issue, it still sometimes work not as Joseph described: 3a) In r155892, r155893 and r259314 Alex is not counted as the only author without any reason. 3b) In r139854, r141108 and r196252 script selected the author successfully, while actually there are more that one. 3c) Maybe here we can also somehow organize ChangeLog priority (again, gcc/ChangeLog is more important that testsuite one). There are a lot of examples, when testsuite/ChangeLog entry have another author: r145055, r150680, r155889, r155894, r155890, r163904, r180186 and r183325.
3d) If we fix 3b+3c we can also look at r143753, r155890 and r155895.
3e) r155891, r207422, r183627 and r234218 are examples of commits which don't touch any ChangeLog files for different reasons. Seems unsolvable in current approach.

--
Roman


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]