[PATCH v3] arm: Low overhead loop handle long range branches [PR98931]
Andrea Corallo
andrea.corallo@arm.com
Wed Feb 10 17:44:42 GMT 2021
Andrea Corallo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>
>> On 09/02/2021 16:27, Andrea Corallo via Gcc-patches wrote:
>>> Jakub Jelinek <jakub@redhat.com> writes:
>>>
>>>> On Tue, Feb 09, 2021 at 03:09:43PM +0100, Jakub Jelinek via Gcc-patches wrote:
>>>>>> "TARGET_32BIT && TARGET_HAVE_LOB"
>>>>>> - "le\t%|lr, %l0")
>>>>>> + "*
>>>>>> + if (get_attr_length (insn) == 4)
>>>>>> + return \"le\\t%|lr, %l0\";
>>>>>> + else
>>>>>> + return \"subs\\t%|lr, #1\;bne\\t%l0\";
>>>>>> + "
>>>>>
>>>>> Why not
>>>>> {
>>>>> if (get_attr_length (insn) == 4)
>>>>> return "le\t%|lr, %l0";
>>>>> else
>>>>> return "subs\t%|lr, #1;bne\t%l0";
>>>>> }
>>>>> instead? Seems the arm backend uses "*..." more than the more modern {},
>>>>> but one needs to backslash prefix a lot which makes it less readable?
>>>>
>>>> Where "more modern" is introduced 19.5 years ago ;)
>>>>
>>>> Jakub
>>>
>>> I guess we really like traditions :)
>>>
>>> Attached second version addressing this.
>>>
>>> Thanks
>>>
>>> Andrea
>>>
>>
>> You're missing a clobber of the condition codes in the RTL. This wasn't
>> needed before, but is now.
>>
>> R.
>
> Hi Richard,
>
> thanks for reviewing, I guess this is going to be a good learning moment
> for me.
>
> What we originally expand is:
>
> (insn 2396 2360 2397 3 (parallel [
> (set (reg:CC_NZ 100 cc)
> (compare:CC_NZ (plus:SI (reg:SI 14 lr)
> (const_int -1 [0xffffffffffffffff]))
> (const_int 0 [0])))
> (set (reg:SI 14 lr)
> (plus:SI (reg:SI 14 lr)
> (const_int -1 [0xffffffffffffffff])))
> ]) "p1.c":4:21 -1
> (nil))
> (jump_insn 2397 2396 2365 3 (set (pc)
> (if_then_else (ne (reg:CC_NZ 100 cc)
> (const_int 0 [0]))
> (label_ref:SI 2361)
> (pc))) "p1.c":4:21 273 {arm_cond_branch}
> (expr_list:REG_DEAD (reg:CC_NZ 100 cc)
> (int_list:REG_BR_PROB 1062895996 (nil)))
> -> 2361)
>
> Combine recognizing cc:CC_NZ as a dead reg and rewriting the two insns
> as:
>
> (jump_insn 2397 2396 2365 3 (parallel [
> (set (pc)
> (if_then_else (ne (reg:SI 14 lr)
> (const_int 1 [0x1]))
> (label_ref:SI 2361)
> (pc)))
> (set (reg:SI 14 lr)
> (plus:SI (reg:SI 14 lr)
> (const_int -1 [0xffffffffffffffff])))
> ]) "p1.c":4:21 1047 {*doloop_end_internal}
> (int_list:REG_BR_PROB 1062895996 (nil))
> -> 2361)
>
> I originally thought that because the write of reg:CC_NZ is explicit in
> the first pattern we expand this was sufficient, but I now understand
> I'm wrong and combine should produce a pattern still expressing this.
> Now the question is how to do that.
>
> If I add the clobber '(clobber (reg:CC CC_REGNUM))' inside the parallel
> of *doloop_end_internal as last element of the vector we ICE in
> 'add_clobbers' called during combine, apparently 'add_clobbers' does not
> handle the insn_code_number.
>
> If I add it as second element of the parallel combine is not combining
> the two insns.
>
> If I place the clobber outside the parallel as a second element of the
> insn vector combine is crashing in 'recog_for_combine_1'.
>
> So the question is probably: where should the clobber be positioned
> canonically to have this working?
>
> Thanks!
>
> Andrea
Righ, I've been explained by a knowledgeable colleague that the
'parallel' is implicit in the 'define_insn' and there's no need to
express it (interestgly this is confusing the code generating
'add_clobbers').
The attached patch rewrites the pattern as such and adds the missing
clobber.
Tests are running, okay for trunk when done with these?
Regards
Andrea
-------------- next part --------------
>From c822f9374f1c16a0f863ca521fd87a350b616db4 Mon Sep 17 00:00:00 2001
From: Andrea Corallo <andrea.corallo@arm.com>
Date: Wed, 3 Feb 2021 15:21:54 +0100
Subject: [PATCH] arm: Low overhead loop handle long range branches [PR98931]
gcc/ChangeLog
2021-02-05 Andrea Corallo <andrea.corallo@arm.com>
* config/arm/thumb2.md (*doloop_end_internal): Generate
alternative sequence to handle long range branches.
gcc/testsuite/Changelog
2021-02-08 Andrea Corallo <andrea.corallo@arm.com>
* gcc.target/arm/pr98931.c: New testcase.
---
gcc/config/arm/thumb2.md | 28 ++++++++++++++++++--------
gcc/testsuite/gcc.target/arm/pr98931.c | 17 ++++++++++++++++
2 files changed, 37 insertions(+), 8 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arm/pr98931.c
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index bd53bf320de..320fe998075 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1711,15 +1711,27 @@
;; Originally expanded by 'doloop_end'.
(define_insn "*doloop_end_internal"
- [(parallel [(set (pc)
- (if_then_else
- (ne (reg:SI LR_REGNUM) (const_int 1))
- (label_ref (match_operand 0 "" ""))
- (pc)))
- (set (reg:SI LR_REGNUM)
- (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
+ [(set (pc)
+ (if_then_else
+ (ne (reg:SI LR_REGNUM) (const_int 1))
+ (label_ref (match_operand 0 "" ""))
+ (pc)))
+ (set (reg:SI LR_REGNUM)
+ (plus:SI (reg:SI LR_REGNUM) (const_int -1)))
+ (clobber (reg:CC CC_REGNUM))]
"TARGET_32BIT && TARGET_HAVE_LOB"
- "le\t%|lr, %l0")
+ {
+ if (get_attr_length (insn) == 4)
+ return "le\t%|lr, %l0";
+ else
+ return "subs\t%|lr, #1;bne\t%l0";
+ }
+ [(set (attr "length")
+ (if_then_else
+ (lt (minus (pc) (match_dup 0)) (const_int 1024))
+ (const_int 4)
+ (const_int 6)))
+ (set_attr "type" "branch")])
(define_expand "doloop_begin"
[(match_operand 0 "" "")
diff --git a/gcc/testsuite/gcc.target/arm/pr98931.c b/gcc/testsuite/gcc.target/arm/pr98931.c
new file mode 100644
index 00000000000..313876a3912
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr98931.c
@@ -0,0 +1,17 @@
+/* { dg-do assemble } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } } */
+/* { dg-options "-march=armv8.1-m.main -O3 --param=max-completely-peeled-insns=1300 --save-temps" } */
+
+extern long long a[][20][26][26][22];
+
+void
+foo ()
+{
+ for (short d = 0; d + 1; d++)
+ for (unsigned e = 0; e < 25; e += 4)
+ for (unsigned f = 0; f < 25; f += 4)
+ for (int g = 0; g < 21; g += 4)
+ a[4][d][e][f][g] = 0;
+}
+
+/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */
--
2.20.1
More information about the Gcc-patches
mailing list