[PATCH v3] arm: Low overhead loop handle long range branches [PR98931]

Andrea Corallo andrea.corallo@arm.com
Wed Feb 10 17:44:42 GMT 2021


Andrea Corallo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>
>> On 09/02/2021 16:27, Andrea Corallo via Gcc-patches wrote:
>>> Jakub Jelinek <jakub@redhat.com> writes:
>>> 
>>>> On Tue, Feb 09, 2021 at 03:09:43PM +0100, Jakub Jelinek via Gcc-patches wrote:
>>>>>>    "TARGET_32BIT && TARGET_HAVE_LOB"
>>>>>> -  "le\t%|lr, %l0")
>>>>>> +  "*
>>>>>> +  if (get_attr_length (insn) == 4)
>>>>>> +    return \"le\\t%|lr, %l0\";
>>>>>> +  else
>>>>>> +    return \"subs\\t%|lr, #1\;bne\\t%l0\";
>>>>>> +  "
>>>>>
>>>>> Why not
>>>>> {
>>>>>   if (get_attr_length (insn) == 4)
>>>>>     return "le\t%|lr, %l0";
>>>>>   else
>>>>>     return "subs\t%|lr, #1;bne\t%l0";
>>>>> }
>>>>> instead?  Seems the arm backend uses "*..." more than the more modern {},
>>>>> but one needs to backslash prefix a lot which makes it less readable?
>>>>
>>>> Where "more modern" is introduced 19.5 years ago ;)
>>>>
>>>> 	Jakub
>>> 
>>> I guess we really like traditions :)
>>> 
>>> Attached second version addressing this.
>>> 
>>> Thanks
>>> 
>>>   Andrea
>>> 
>>
>> You're missing a clobber of the condition codes in the RTL.  This wasn't
>> needed before, but is now.
>>
>> R.
>
> Hi Richard,
>
> thanks for reviewing, I guess this is going to be a good learning moment
> for me.
>
> What we originally expand is:
>
> (insn 2396 2360 2397 3 (parallel [
>             (set (reg:CC_NZ 100 cc)
>                 (compare:CC_NZ (plus:SI (reg:SI 14 lr)
>                         (const_int -1 [0xffffffffffffffff]))
>                     (const_int 0 [0])))
>             (set (reg:SI 14 lr)
>                 (plus:SI (reg:SI 14 lr)
>                     (const_int -1 [0xffffffffffffffff])))
>         ]) "p1.c":4:21 -1
>      (nil))
> (jump_insn 2397 2396 2365 3 (set (pc)
>         (if_then_else (ne (reg:CC_NZ 100 cc)
>                 (const_int 0 [0]))
>             (label_ref:SI 2361)
>             (pc))) "p1.c":4:21 273 {arm_cond_branch}
>      (expr_list:REG_DEAD (reg:CC_NZ 100 cc)
>         (int_list:REG_BR_PROB 1062895996 (nil)))
>  -> 2361)
>
> Combine recognizing cc:CC_NZ as a dead reg and rewriting the two insns
> as:
>
> (jump_insn 2397 2396 2365 3 (parallel [
>             (set (pc)
>                 (if_then_else (ne (reg:SI 14 lr)
>                         (const_int 1 [0x1]))
>                     (label_ref:SI 2361)
>                     (pc)))
>             (set (reg:SI 14 lr)
>                 (plus:SI (reg:SI 14 lr)
>                     (const_int -1 [0xffffffffffffffff])))
>         ]) "p1.c":4:21 1047 {*doloop_end_internal}
>      (int_list:REG_BR_PROB 1062895996 (nil))
>  -> 2361)
>
> I originally thought that because the write of reg:CC_NZ is explicit in
> the first pattern we expand this was sufficient, but I now understand
> I'm wrong and combine should produce a pattern still expressing this.
> Now the question is how to do that.
>
> If I add the clobber '(clobber (reg:CC CC_REGNUM))' inside the parallel
> of *doloop_end_internal as last element of the vector we ICE in
> 'add_clobbers' called during combine, apparently 'add_clobbers' does not
> handle the insn_code_number.
>
> If I add it as second element of the parallel combine is not combining
> the two insns.
>
> If I place the clobber outside the parallel as a second element of the
> insn vector combine is crashing in 'recog_for_combine_1'.
>
> So the question is probably: where should the clobber be positioned
> canonically to have this working?
>
> Thanks!
>
>   Andrea

Righ, I've been explained by a knowledgeable colleague that the
'parallel' is implicit in the 'define_insn' and there's no need to
express it (interestgly this is confusing the code generating
'add_clobbers').

The attached patch rewrites the pattern as such and adds the missing
clobber.

Tests are running, okay for trunk when done with these?

Regards

  Andrea

-------------- next part --------------
>From c822f9374f1c16a0f863ca521fd87a350b616db4 Mon Sep 17 00:00:00 2001
From: Andrea Corallo <andrea.corallo@arm.com>
Date: Wed, 3 Feb 2021 15:21:54 +0100
Subject: [PATCH] arm: Low overhead loop handle long range branches [PR98931]

gcc/ChangeLog

2021-02-05  Andrea Corallo  <andrea.corallo@arm.com>

        * config/arm/thumb2.md (*doloop_end_internal): Generate
	alternative sequence to handle long range branches.

gcc/testsuite/Changelog

2021-02-08  Andrea Corallo  <andrea.corallo@arm.com>

         * gcc.target/arm/pr98931.c: New testcase.
---
 gcc/config/arm/thumb2.md               | 28 ++++++++++++++++++--------
 gcc/testsuite/gcc.target/arm/pr98931.c | 17 ++++++++++++++++
 2 files changed, 37 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr98931.c

diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index bd53bf320de..320fe998075 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1711,15 +1711,27 @@
 
 ;; Originally expanded by 'doloop_end'.
 (define_insn "*doloop_end_internal"
-  [(parallel [(set (pc)
-                   (if_then_else
-                       (ne (reg:SI LR_REGNUM) (const_int 1))
-                     (label_ref (match_operand 0 "" ""))
-                     (pc)))
-              (set (reg:SI LR_REGNUM)
-                   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
+  [(set (pc)
+        (if_then_else
+            (ne (reg:SI LR_REGNUM) (const_int 1))
+          (label_ref (match_operand 0 "" ""))
+          (pc)))
+   (set (reg:SI LR_REGNUM)
+        (plus:SI (reg:SI LR_REGNUM) (const_int -1)))
+   (clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT && TARGET_HAVE_LOB"
-  "le\t%|lr, %l0")
+  {
+    if (get_attr_length (insn) == 4)
+      return "le\t%|lr, %l0";
+    else
+      return "subs\t%|lr, #1;bne\t%l0";
+  }
+  [(set (attr "length")
+        (if_then_else
+            (lt (minus (pc) (match_dup 0)) (const_int 1024))
+	    (const_int 4)
+	    (const_int 6)))
+   (set_attr "type" "branch")])
 
 (define_expand "doloop_begin"
   [(match_operand 0 "" "")
diff --git a/gcc/testsuite/gcc.target/arm/pr98931.c b/gcc/testsuite/gcc.target/arm/pr98931.c
new file mode 100644
index 00000000000..313876a3912
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr98931.c
@@ -0,0 +1,17 @@
+/* { dg-do assemble } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } } */
+/* { dg-options "-march=armv8.1-m.main -O3 --param=max-completely-peeled-insns=1300 --save-temps" } */
+
+extern long long a[][20][26][26][22];
+
+void
+foo ()
+{
+  for (short d = 0; d + 1; d++)
+    for (unsigned e = 0; e < 25; e += 4)
+      for (unsigned f = 0; f < 25; f += 4)
+        for (int g = 0; g < 21; g += 4)
+          a[4][d][e][f][g] = 0;
+}
+
+/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */
-- 
2.20.1



More information about the Gcc-patches mailing list