Commit messages and the move to git

Richard Earnshaw (lists)
Mon Nov 18 19:21:00 GMT 2019

On 18/11/2019 18:53, Segher Boessenkool wrote:
> On Mon, Nov 18, 2019 at 05:38:14PM +0000, Richard Earnshaw (lists) wrote:
>> On 18/11/2019 17:11, Segher Boessenkool wrote:
>>> I think that non-obviously-wrong-but-still-wrong info is not something
>>> we should introduce.  And, I think this will happen a *lot*.
>>> Maybe you can just put in artificial subjects like "Patch related to
>>> PR12345" or the like?  That is correct, displays a lot better, and is
>>> at least as useful (imo).
>>>> There are about 560 commits where the code highlights that the data
>>>> might be suspect (usually a category mismatch).
>>> What about commits that mention multiple PRs?  What do you do for those?
>>> (Are there as many of those as I think, anyway?)  With normally very short
>>> subjects you could put all of them in there :-)
>> Depends on the context.  If there look to be multiple date-stamp-author 
>> patterns and the lines are not identical we put:
>> [multiple commits]
>> If there are more that three PRs and the word backport appears in the 
>> text, then it generates a 'backport' summary of the form
>> Backport PRs 91606, 91772, 91790, 91812, 91968
>> or if there are more than about 10 prs,
>> Backport PRs 41611, 41905, 41906, 41961, 42006, 42025, 42057, 42069, 
>> 42078, 42084 and more
>> it's easy to change the thresholds.
>> Otherwise we just pick the first PR found.  The assumption in this case 
>> is that PRs are related and thus one summary is likely as good as another.
>> Yes, we can just print PR numbers, and we could print multiple numbers; 
>> but generally I find that less helpful than the summary.
>> Finally, note that this does *not* delete any information.  The summary 
>> is added in addition to the existing text rather than replacing it.
>> I've attached a sample from the start of the fixed list - the full list 
>> is far too big to post to give a flavour of how the script currently 
>> works.  Note that annotations of the form [checkme: ....] in the summary 
>> are for diagnostic purposes.  These are where heuristics suggest that 
>> there's a higher than normal chance that the PR number is incorrect and 
>> that manual auditing is recommended.  Such annotations would not be 
>> appropriate in the final conversion.
> Random examples: (2000 lines from the end of the file):
> PR target/92140: clang vs gcc optimizing with adc/sbb
> PR fortran/91926: assumed rank optional
> PR tree-optimization/91532: [SVE] Redundant predicated store in
> PR tree-optimization/92161: ICE in vect_get_vec_def_for_stmt_copy, at tree-vect-stmts.c:1687
> PR tree-optimization/92162: ICE in vect_create_epilog_for_reduction, at tree-vect-loop.c:4252
> PR c++/92015: internal compiler error: in cxx_eval_array_reference, at cp/constexpr.c:2568
> PR tree-optimization/92173: ICE in optab_for_tree_code, at optabs-tree.c:81
> PR tree-optimization/92173: ICE in optab_for_tree_code, at optabs-tree.c:81
> PR fortran/92174: runtime error: index 15 out of bounds for type 'gfc_expr *[15]
> Most of these aren't helpful at all, and none of these are good commit
> summaries.  The PR92173 one actually has identical commit messages btw,
> huh.  Ah, the second one (r277288) has the wrong changelog, but in the
> actual changelog file as well, not something any tool could fix up (or
> have we reached the singularity?)

Identical commits are normally from where the same commit is made to
multiple branches.  It's not uncommon to see this when bugs are fixed.
Of course, in the git conversion these will likely never be connected
up, since there's no metadata for the tools to work on.

Ultimately the question here is whether something like the above is more
or less useful than what we have today, which is summary lines of the form:

<date> <user> <email>

I think the above is just useless since it duplicates information
already in the short summaries, and unless we have an extraordiarily
high error rate, a summary based on the bugzilla entry is more likely to
be useful than that, even if it is far from perfect.

None of this would be necessary, of course, if we had good short
summaries already for every commit, which is why I'd like EVERYONE to
start doing that NOW, rather than waiting for the conversion to happen.


More information about the Gcc mailing list