This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Commit messages and the move to git
- From: "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>
- To: Jason Merrill <jason at redhat dot com>
- Cc: Segher Boessenkool <segher at kernel dot crashing dot org>, Eric Raymond <esr at thyrsus dot com>, Jeff Law <law at redhat dot com>, GCC Development <gcc at gcc dot gnu dot org>
- Date: Wed, 4 Dec 2019 23:51:49 +0000
- Subject: Re: Commit messages and the move to git
- References: <20191107142727.GA72444@thyrsus.com> <a5e2428c-60d5-3157-6b5f-662cfa76767b@arm.com> <20191109060151.GA82270@thyrsus.com> <78a57894-74f2-94d5-65a1-3ce2bd934f32@arm.com> <20191118155549.GH16031@gate.crashing.org> <fb6aa5da-9bc7-1fb5-4dee-efdcdf849a46@arm.com> <20191118171115.GI16031@gate.crashing.org> <ac635c47-884c-3832-193b-d45ce9336bca@arm.com> <20191118185328.GJ16031@gate.crashing.org> <f03489b4-0d59-4003-d13b-5f0d7e83289a@arm.com> <20191118194450.GL16031@gate.crashing.org> <CADzB+2mL-e8J3yTVcuT2yn0ZgHu_KHjRqqXE-nJ4sVODyYi8+A@mail.gmail.com> <c11d0e89-ac80-5600-276c-397531f0417c@arm.com> <CADzB+2nPi_Vt-72D_TDzuqN86+HE9ANsMbhMsOkCWRSVWHdjVA@mail.gmail.com> <632176b4-0ff6-1389-1576-138700ea900b@arm.com>
On 02/12/2019 10:54, Richard Earnshaw (lists) wrote:
> On 19/11/2019 14:56, Jason Merrill wrote:
>> On Mon, Nov 18, 2019 at 4:38 PM Richard Earnshaw (lists) <
>> Richard.Earnshaw@arm.com> wrote:
>>
>>> On 18/11/2019 20:53, Jason Merrill wrote:
>>>> On Mon, Nov 18, 2019 at 2:51 PM Segher Boessenkool <
>>>> segher@kernel.crashing.org> wrote:
>>>>
>>>>> On Mon, Nov 18, 2019 at 07:21:22PM +0000, Richard Earnshaw (lists)
>>> wrote:
>>>>>> On 18/11/2019 18:53, Segher Boessenkool wrote:
>>>>>>> PR target/92140: clang vs gcc optimizing with adc/sbb
>>>>>>> PR fortran/91926: assumed rank optional
>>>>>>> PR tree-optimization/91532: [SVE] Redundant predicated store in
>>>>> gcc.target/aarch64/fmla_2.c
>>>>>>> PR tree-optimization/92161: ICE in
>>>>>>> vect_get_vec_def_for_stmt_copy, at
>>>>> tree-vect-stmts.c:1687
>>>>>>> PR tree-optimization/92162: ICE in vect_create_epilog_for_reduction,
>>>>> at tree-vect-loop.c:4252
>>>>>>> PR c++/92015: internal compiler error: in
>>>>>>> cxx_eval_array_reference, at
>>>>> cp/constexpr.c:2568
>>>>>>> PR tree-optimization/92173: ICE in optab_for_tree_code, at
>>>>> optabs-tree.c:81
>>>>>>> PR tree-optimization/92173: ICE in optab_for_tree_code, at
>>>>> optabs-tree.c:81
>>>>>>> PR fortran/92174: runtime error: index 15 out of bounds for type
>>>>> 'gfc_expr *[15]
>>>>>>>
>>>>>>> Most of these aren't helpful at all, and none of these are good
>>>>>>> commit
>>>>>>> summaries. The PR92173 one actually has identical commit messages
>>> btw,
>>>>>>> huh. Ah, the second one (r277288) has the wrong changelog, but
>>>>>>> in the
>>>>>>> actual changelog file as well, not something any tool could fix
>>>>>>> up (or
>>>>>>> have we reached the singularity?)
>>>>>>
>>>>>> Identical commits are normally from where the same commit is made to
>>>>>> multiple branches. It's not uncommon to see this when bugs are
>>>>>> fixed.
>>>>>
>>>>> This is an actual mistake. The commits are not identical at all, just
>>>>> the commit messages are (and the changelog entries, too). Not
>>>>> something
>>>>> that happens to ften, but of course I hit it in the first random
>>>>> thing I
>>>>> pick :-)
>>>>>
>>>>>> Ultimately the question here is whether something like the above is
>>> more
>>>>>> or less useful than what we have today, which is summary lines of the
>>>>> form:
>>>>>>
>>>>>> <date> <user> <email>
>>>>>
>>>>> I already said I would prefer things like
>>>>> Patch related to PR323
>>>>> as the patch subject lines. No one argues that the current state of
>>>>> affairs is good. I argue that replacing this with often wrong and
>>>>> irrelevant information isn't the best we can do.
>>>>>
>>>>
>>>> How about using the first line that isn't a ChangeLog date/author line,
>>>> without trying to rewrite/augment it?
>>>>
>>>> Jason
>>>>
>>>
>>> It would certainly be another way of doing it. Sometimes it would
>>> produce almost the same as an unadulterated PR; sometimes it would
>>> produce something more meaningful and sometimes it would be pretty
>>> useless. It probably would hit more cases than my current script in
>>> that it wouldn't require the commit to mention a PR in it.
>>>
>>> The main problem is that the first line is often incomplete, and much of
>>> it is also wasted with elements like the full path to a file that is
>>> quite deep in the tree. Lets take a quick example (the first I found in
>>> the dump I have).
>>>
>>> 1998-12-17 Vladimir N. Makarov <vmakarov@cygnus.com>
>>> * config/i60/i960.md (extendqihi2): Fix typo (usage ','
>>> instead of
>>> ';').
>>> 1998-12-17 Michael Tiemann <tiemann@axon.cygnus.com>
>>> * i960.md (extend*, zero_extend*): Don't generate rtl that
>>> looks
>>> like (subreg:SI (reg:SI N) 0), because it's wrong, and it hides
>>> optimizations from the combiner.
>>>
>>> Firstly, this example misses a blank line between the author and the
>>> change message itself, which makes distinguishing between this and the
>>> multiple authors case harder. Secondly, the entry really spans two
>>> lines and cutting it off at the end of the first line would be, well a
>>> bit odd. We could try to piece things together more, by combining lines
>>> until we find a sentence end ( \.$ or \.\s\s ), and we could also strip
>>> off more of the path components to just leave the name of the file
>>> committed. For example,
>>>
>>> i960.md (extendqihi2): Fix typo (usage ',' instead of ';').
>>>
>>> That might work better, but obviously it's harder to handle and thus
>>> more likely to create erroneous summaries.
>>>
>>
>> Yep. I don't think we need to worry about getting optimal one-line
>> summaries for ancient commits; something reasonably unique should be
>> plenty.
>>
>
> Attached is the latest version of my script. I used (very nearly) this
> to produce a conversion over the weekend and I've uploaded that here:
>
> https://gitlab.com/rearnsha/gcc-trial-20191130
>
> Note, that I might blow this away at any time. IT IS NOT A FINAL
> CONVERSION.
>
> Some other things to note:
> - there are a number of known issues with the version of reposurgeon
> used for this that are being worked on
> - emptycommit-* tags - my control script was out-of-date
> - *deleted* branches - this is being worked on
> - weird dependencies around merges - this is being worked on
> - author attributions are sometimes incorrect - reported
>
I've just pushed a new trial conversion:
https://gitlab.com/rearnsha/gcc-trial2-20191130
The main differences between this and the previous trial are:
- The author attributions should now be fixed, please let me know if you
see any anomalies in this respect.
- the emptycommit-* tags/branches are now gone.
- the 'tags' used for revert and backport now use more gittish style
revert: and backport:
- the log entries for c++ style functions containing :: are now handled
correctly by my summary generation script.
Other issues are still being worked on.
R.
> The main difference between the attached script and the one I used for
> this conversion is that ChangeLog change that contain :: inside a
> function list is now handled correctly, resulting in a number of cases
> that were previously being missed now being correctly handled.
>
> Choices I made:
> - When a PR is used to derive the summary, I prefix this with 're' (as
> in the Latin 'in re'.
> - long change hunks produce poor summaries. To reduce the overhead:
> - path names are removed, leaving just the final file name
> - multiple files are replaced with [...] after the first filename
> - similarly, multiple function names are replaced with [...]
> - very long comments are truncated, preferably at the strongest
> punctuation mark, but sometimes after key words, such as 'if',
> 'when', 'unless' and a few more. Ultimately, if the line is
> still too long, we just break after an arbitrary space.
> - Where possible useful summary lines that appear after an author,
> attribution are hoisted as a summary.
> - certain key words in otherwise not very useful summary lines are
> also spotted and used to add [revert] or [backport] annotations to
> the summary.
>
> No changes are made to the main commit log, if we add a new summary
> line, the entire original text is kept.
>
> An example of a summary produced by this is for the commit to r278572,
> where the original log entry is:
>
>
> Backported from mainline
> 2019-08-02 Jakub Jelinek <jakub@redhat.com>
>
> * quadmath.h (M_Eq, M_LOG2Eq, M_LOG10Eq, M_LN2q, M_LN10q, M_PIq,
> M_PI_2q, M_PI_4q, M_1_PIq, M_2_PIq, M_2_SQRTPIq, M_SQRT2q,
> M_SQRT1_2q): Use two more decimal places.
>
> And the script then generates:
>
> [backport] quadmath.h (M_Eq, [...]): Use two more decimal places.
>
> as the summary.
>
> R.