Created attachment 53452 [details] Minimal test case GCC 12.1.0 generates incorrect code for loops that call into noreturn functions starting at -O1. For example, __attribute__((noreturn)) void g(void); void f(int *values) { for(int i = 0; i < 8; i++) if(values[i] != 0) g(); } gets compiled into sts.l pr,@-r15 mov #8,r1 .L3: bt.s .L2 dt r1 mov.l .L5,r1 # _g jsr @r1 nop .L2: bf .L3 lds.l @r15+,pr rts nop which has the obvious issue of not reading its input, and also using the T bit through bt.s before doing any test. I suppose this is an overly aggressive loop/CFG optimization, but I was not able to trim it down as unfolding -O1 into the list of optimizations provided by -Q --help=optimizers produced a completely different program. The bug still occurs with -fno-aggressive-loop-optimizations at least. With -funroll-all-loops, instead of 8 mov.l/tst/bf I just get 8 bf which suggests that the test is wrongly eliminated causing the load to become dead. Found in: GCC 12.1.0 Commit 4991e2092 of today's master Not found in: GCC 11.1.0 Configured with: ../gcc-12.1.0/configure --prefix=$PREFIX --target=sh3eb-elf --enable-languages=c --without-headers Reproduction instructions: $PREFIX/bin/sh3eb-elf-gcc -S noreturn-loop-bug.c -o - -O1
.L3: bt.s .L2 dt r1 are there any delay slots in SH? Does -fno-schedule-insns -fno-schedule-insns2 help?
Yes there are delay slots for all branches except bt and bf, so here bt.s, jsr and rts all have one. (-fno-delayed-branches avoids them but that doesn't affect the bad optimization in this case.) Adding -fno-schedule-insns -fno-schedule-insns2 doesn't help; the test and load are still eliminated. In fact, I negated the entirety of -Q --help=optimizers and still got an incorrect output. It seems that the problem lies further downstream. I dumped various RTL stages and -fdump-rtl-mach is the first stage where the comparison to 0 is gone, if the order in the manual is anything to go by. Since I have both functional and non-functional outputs from different RTL stages, I'm switching the component to rtl-optimization.
> -fdump-rtl-mach is the first stage where the comparison to 0 is gone Then this is a target specific issue until provided otherwise. mach stands for machine (target) specific pass.
> Then this is a target specific issue until provided otherwise. mach stands for machine (target) specific pass. That makes a lot of sense, thanks. I found a much simpler example exhibiting the bug: int f(int i, int j) { if(i < 0) return 1; if(i + j) return 3; return i; } Latest HEAD with -O1 (same setup as before) compiles it to _f: cmp/pz r4 bf .L3 bf .L4 rts mov r4,r0 .L3: rts mov #1,r0 .L4: rts mov #3,r0 incorrectly eliminating the test for (i + j) != 0. The label .L4 returning 3 is evidently unreachable. Comparing to 0 seems to be required. My previous example also compiles correctly if we check values[i] != 1 instead, which invalidates the loop/CFG theory. Few commits changed the SH subtree since GCC 11.1.0; I should be able to bisect it. In the meantime I've updated the title.
First bad commit is r12-1955-ga86b3453fc6e29cf0e19916b01c393652d838d56, though I don't know what path is taken from there to the incorrect rewrite.
I can reproduce with gcc-12 as host compiler, but not with gcc-11 as host compiler. I'm using the example in comment #4, and a recent HEAD of master (bac07a1da24) configured with --target=sh3eb-elf. With "GCC: (GNU) 11.3.1 20220812" as host compiler, the cross-compiler compiles (-O1) the example to: _f: cmp/pz r4 bf.s .L3 add r4,r5 tst r5,r5 bf .L4 rts mov r4,r0 .align 1 .L3: rts mov #1,r0 .align 1 .L4: rts mov #3,r0 which looks ok to me (not knowing SH asm...). With "GCC: (GNU) 12.1.1 20220812" (the 12.2.0 RC) as host compiler, the cross-compiler instead compiles the example to: _f: cmp/pz r4 bf .L3 bf .L4 rts mov r4,r0 .align 1 .L3: rts mov #1,r0 .align 1 .L4: rts mov #3,r0 which looks broken.
>I can reproduce with gcc-12 as host compiler, but not with gcc-11 as host compiler. That is good to know. Now someone needs to figure out where the target compiler is being miscompiled ...
# first bad commit: [3155d51bfd1de8b6c4645dcb2292248a8d7cc3c9] [PATCH] PR rtl-optimization/46235: Improved use of bt for bit tests on x86_64. Starting with this commit, the host compiler (on x86_64-linux) miscompiles the gcc-13 based cross-compiler to sh3eb-elf.
The command line option -mtune-ctrl=^use_bt should disable all of the patterns added in the patch identified as the first bad commit.
GCC 12.2 is being released, retargeting bugs to GCC 12.3.
I tried compiling the gcc-13 cross compiler using the broken gcc-12 host compiler and -mtune-ctrl=^use_bt but that didn't help. I then tried rebuilding the broken gcc-12 host compiler with the new splitters disabled, one by one. Disabling the "*bt<mode>_setcqi" one did unbreak the gcc-13 cross-compiler: diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 48532eb7ddf..0780ba992f3 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12830,7 +12830,7 @@ (const_int 1) (zero_extend:SI (match_operand:QI 2 "register_operand")))) (clobber (reg:CC FLAGS_REG))] - "TARGET_USE_BT && ix86_pre_reload_split ()" + "0 && TARGET_USE_BT && ix86_pre_reload_split ()" "#" "&& 1" [(set (reg:CCC FLAGS_REG)
(In reply to Mikael Pettersson from comment #12) > I tried compiling the gcc-13 cross compiler using the broken gcc-12 host > compiler and -mtune-ctrl=^use_bt but that didn't help. > > I then tried rebuilding the broken gcc-12 host compiler with the new > splitters disabled, one by one. Disabling the "*bt<mode>_setcqi" one did > unbreak the gcc-13 cross-compiler: > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 48532eb7ddf..0780ba992f3 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -12830,7 +12830,7 @@ > (const_int 1) > (zero_extend:SI (match_operand:QI 2 "register_operand")))) > (clobber (reg:CC FLAGS_REG))] > - "TARGET_USE_BT && ix86_pre_reload_split ()" > + "0 && TARGET_USE_BT && ix86_pre_reload_split ()" > "#" > "&& 1" > [(set (reg:CCC FLAGS_REG) Ok, reproduced with last night's gcc trunk as the x86_64-linux system compiler (with/without the above patch) and r12-8924-ga6b1f6126de5e4 as 12 branch for the cross-compiler. The difference appears first in the sh_treg_combine2 dump -not a condition store -other set found - aborting trace +inverted condition store +tracing ccreg +set of ccreg not found + +cbranch trace summary: etc. And bisection points to insn-preds.o.
Ugh, wasted time. This is (well, was) a SH backend bug, fixed apparently on the trunk by Jeff in r13-4118-ge214cab68cb34e77622b91113f7698cf137bbdd6 gcc/config/sh/sh_treg_combine.cc contained bogus // FIXME: Remove dependency on SH predicate function somehow. extern int t_reg_operand (rtx, machine_mode); extern int negt_reg_operand (rtx, machine_mode); declarations, when the definitions of those functions actually were bool t_reg_operand (rtx op, machine_mode mode ATTRIBUTE_UNUSED) { ... } bool negt_reg_operand (rtx op, machine_mode mode ATTRIBUTE_UNUSED) { ... } As x86_64 ABI on bool return only guarantees the state of the low 8 bits, the callee could leave garbage in the upper 56 bits of the %rax register, and then the caller would test 32 bits against zero as it was told it returns int.
Thanks, turns out my bisected commit was related after all... I can confirm that test cases from OP and #4 (with protocol from OP) are no longer broken for me on yesterday's master.
so just needs a 12 backport?
GCC 12.3 is being released, retargeting bugs to GCC 12.4.