Commit messages and the move to git

Richard Earnshaw (lists) Richard.Earnshaw@arm.com
Mon Dec 2 10:54:00 GMT 2019


On 19/11/2019 14:56, Jason Merrill wrote:
> On Mon, Nov 18, 2019 at 4:38 PM Richard Earnshaw (lists) <
> Richard.Earnshaw@arm.com> wrote:
> 
>> On 18/11/2019 20:53, Jason Merrill wrote:
>>> On Mon, Nov 18, 2019 at 2:51 PM Segher Boessenkool <
>>> segher@kernel.crashing.org> wrote:
>>>
>>>> On Mon, Nov 18, 2019 at 07:21:22PM +0000, Richard Earnshaw (lists)
>> wrote:
>>>>> On 18/11/2019 18:53, Segher Boessenkool wrote:
>>>>>> PR target/92140: clang vs gcc optimizing with adc/sbb
>>>>>> PR fortran/91926: assumed rank optional
>>>>>> PR tree-optimization/91532: [SVE] Redundant predicated store in
>>>> gcc.target/aarch64/fmla_2.c
>>>>>> PR tree-optimization/92161: ICE in vect_get_vec_def_for_stmt_copy, at
>>>> tree-vect-stmts.c:1687
>>>>>> PR tree-optimization/92162: ICE in vect_create_epilog_for_reduction,
>>>> at tree-vect-loop.c:4252
>>>>>> PR c++/92015: internal compiler error: in cxx_eval_array_reference, at
>>>> cp/constexpr.c:2568
>>>>>> PR tree-optimization/92173: ICE in optab_for_tree_code, at
>>>> optabs-tree.c:81
>>>>>> PR tree-optimization/92173: ICE in optab_for_tree_code, at
>>>> optabs-tree.c:81
>>>>>> PR fortran/92174: runtime error: index 15 out of bounds for type
>>>> 'gfc_expr *[15]
>>>>>>
>>>>>> Most of these aren't helpful at all, and none of these are good commit
>>>>>> summaries.  The PR92173 one actually has identical commit messages
>> btw,
>>>>>> huh.  Ah, the second one (r277288) has the wrong changelog, but in the
>>>>>> actual changelog file as well, not something any tool could fix up (or
>>>>>> have we reached the singularity?)
>>>>>
>>>>> Identical commits are normally from where the same commit is made to
>>>>> multiple branches.  It's not uncommon to see this when bugs are fixed.
>>>>
>>>> This is an actual mistake.  The commits are not identical at all, just
>>>> the commit messages are (and the changelog entries, too).  Not something
>>>> that happens to ften, but of course I hit it in the first random thing I
>>>> pick :-)
>>>>
>>>>> Ultimately the question here is whether something like the above is
>> more
>>>>> or less useful than what we have today, which is summary lines of the
>>>> form:
>>>>>
>>>>> <date> <user> <email>
>>>>
>>>> I already said I would prefer things like
>>>>    Patch related to PR323
>>>> as the patch subject lines.  No one argues that the current state of
>>>> affairs is good.  I argue that replacing this with often wrong and
>>>> irrelevant information isn't the best we can do.
>>>>
>>>
>>> How about using the first line that isn't a ChangeLog date/author line,
>>> without trying to rewrite/augment it?
>>>
>>> Jason
>>>
>>
>> It would certainly be another way of doing it.  Sometimes it would
>> produce almost the same as an unadulterated PR; sometimes it would
>> produce something more meaningful and sometimes it would be pretty
>> useless.  It probably would hit more cases than my current script in
>> that it wouldn't require the commit to mention a PR in it.
>>
>> The main problem is that the first line is often incomplete, and much of
>> it is also wasted with elements like the full path to a file that is
>> quite deep in the tree.  Lets take a quick example (the first I found in
>> the dump I have).
>>
>> 1998-12-17  Vladimir N. Makarov  <vmakarov@cygnus.com>
>>          * config/i60/i960.md (extendqihi2): Fix typo (usage ',' instead of
>>          ';').
>> 1998-12-17  Michael Tiemann  <tiemann@axon.cygnus.com>
>>          * i960.md (extend*, zero_extend*): Don't generate rtl that looks
>>          like (subreg:SI (reg:SI N) 0), because it's wrong, and it hides
>>          optimizations from the combiner.
>>
>> Firstly, this example misses a blank line between the author and the
>> change message itself, which makes distinguishing between this and the
>> multiple authors case harder.  Secondly, the entry really spans two
>> lines and cutting it off at the end of the first line would be, well a
>> bit odd.  We could try to piece things together more, by combining lines
>> until we find a sentence end ( \.$ or \.\s\s ), and we could also strip
>> off more of the path components to just leave the name of the file
>> committed.  For example,
>>
>> i960.md (extendqihi2): Fix typo (usage ',' instead of ';').
>>
>> That might work better, but obviously it's harder to handle and thus
>> more likely to create erroneous summaries.
>>
> 
> Yep. I don't think we need to worry about getting optimal one-line
> summaries for ancient commits; something reasonably unique should be plenty.
> 

Attached is the latest version of my script.  I used (very nearly) this 
to produce a conversion over the weekend and I've uploaded that here:

https://gitlab.com/rearnsha/gcc-trial-20191130

Note, that I might blow this away at any time.  IT IS NOT A FINAL 
CONVERSION.

Some other things to note:
- there are a number of known issues with the version of reposurgeon 
used for this that are being worked on
   - emptycommit-* tags - my control script was out-of-date
   - *deleted* branches - this is being worked on
   - weird dependencies around merges - this is being worked on
   - author attributions are sometimes incorrect - reported

The main difference between the attached script and the one I used for 
this conversion is that ChangeLog change that contain :: inside a 
function list is now handled correctly, resulting in a number of cases 
that were previously being missed now being correctly handled.

Choices I made:
- When a PR is used to derive the summary, I prefix this with 're' (as 
in the Latin 'in re'.
- long change hunks produce poor summaries.  To reduce the overhead:
   - path names are removed, leaving just the final file name
   - multiple files are replaced with [...] after the first filename
   - similarly, multiple function names are replaced with [...]
   - very long comments are truncated, preferably at the strongest
     punctuation mark, but sometimes after key words, such as 'if',
     'when', 'unless' and a few more.  Ultimately, if the line is
     still too long, we just break after an arbitrary space.
- Where possible useful summary lines that appear after an author,
   attribution are hoisted as a summary.
- certain key words in otherwise not very useful summary lines are
   also spotted and used to add [revert] or [backport] annotations to
   the summary.

No changes are made to the main commit log, if we add a new summary 
line, the entire original text is kept.

An example of a summary produced by this is for the commit to r278572, 
where the original log entry is:


     	Backported from mainline
     	2019-08-02  Jakub Jelinek  <jakub@redhat.com>

     	* quadmath.h (M_Eq, M_LOG2Eq, M_LOG10Eq, M_LN2q, M_LN10q, M_PIq,
     	M_PI_2q, M_PI_4q, M_1_PIq, M_2_PIq, M_2_SQRTPIq, M_SQRT2q,
     	M_SQRT1_2q): Use two more decimal places.

And the script then generates:

[backport] quadmath.h (M_Eq, [...]): Use two more decimal places.

as the summary.

R.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bugdb.py
Type: text/x-python
Size: 31670 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc/attachments/20191202/bcf0e721/attachment.py>


More information about the Gcc mailing list