Bug 106609 - [12 Regression] sh3eb-elf cross compiler is being miscompiled since r12-1525-g3155d51bfd1de8b6c4645
Summary: [12 Regression] sh3eb-elf cross compiler is being miscompiled since r12-1525-...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 12.1.0
: P4 blocker
Target Milestone: 12.4
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2022-08-13 17:31 UTC by Sébastien Michelland
Modified: 2023-05-08 12:25 UTC (History)
6 users (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-11-24 00:00:00


Attachments
Minimal test case (150 bytes, text/plain)
2022-08-13 17:31 UTC, Sébastien Michelland
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sébastien Michelland 2022-08-13 17:31:54 UTC
Created attachment 53452 [details]
Minimal test case

GCC 12.1.0 generates incorrect code for loops that call into noreturn functions starting at -O1. For example,

__attribute__((noreturn)) void g(void);
void f(int *values)
{
    for(int i = 0; i < 8; i++)
        if(values[i] != 0) g();
}

gets compiled into

	sts.l	pr,@-r15
	mov	#8,r1
.L3:	bt.s	.L2
	dt	r1
	mov.l	.L5,r1 # _g
	jsr	@r1
	nop
.L2:	bf	.L3
	lds.l	@r15+,pr
	rts	
	nop

which has the obvious issue of not reading its input, and also using the T bit through bt.s before doing any test.

I suppose this is an overly aggressive loop/CFG optimization, but I was not able to trim it down as unfolding -O1 into the list of optimizations provided by -Q --help=optimizers produced a completely different program. The bug still occurs with -fno-aggressive-loop-optimizations at least.

With -funroll-all-loops, instead of 8 mov.l/tst/bf I just get 8 bf which suggests that the test is wrongly eliminated causing the load to become dead.

Found in:
  GCC 12.1.0
  Commit 4991e2092 of today's master
Not found in:
  GCC 11.1.0
Configured with:
  ../gcc-12.1.0/configure --prefix=$PREFIX --target=sh3eb-elf --enable-languages=c --without-headers
Reproduction instructions:
  $PREFIX/bin/sh3eb-elf-gcc -S noreturn-loop-bug.c -o - -O1
Comment 1 Richard Biener 2022-08-15 08:28:29 UTC
.L3:	bt.s	.L2
	dt	r1

are there any delay slots in SH?  Does -fno-schedule-insns -fno-schedule-insns2 help?
Comment 2 Sébastien Michelland 2022-08-15 09:47:06 UTC
Yes there are delay slots for all branches except bt and bf, so here bt.s, jsr and rts all have one. (-fno-delayed-branches avoids them but that doesn't affect the bad optimization in this case.)

Adding -fno-schedule-insns -fno-schedule-insns2 doesn't help; the test and load are still eliminated. In fact, I negated the entirety of -Q --help=optimizers and still got an incorrect output.

It seems that the problem lies further downstream. I dumped various RTL stages and -fdump-rtl-mach is the first stage where the comparison to 0 is gone, if the order in the manual is anything to go by. Since I have both functional and non-functional outputs from different RTL stages, I'm switching the component to rtl-optimization.
Comment 3 Andrew Pinski 2022-08-15 15:52:56 UTC
> -fdump-rtl-mach is the first stage where the comparison to 0 is gone

Then this is a target specific issue until provided otherwise. mach stands for machine (target) specific pass.
Comment 4 Sébastien Michelland 2022-08-16 21:03:43 UTC
> Then this is a target specific issue until provided otherwise. mach stands for machine (target) specific pass.

That makes a lot of sense, thanks.

I found a much simpler example exhibiting the bug:

int f(int i, int j)
{
    if(i < 0) return 1;
    if(i + j) return 3;
    return i;
}

Latest HEAD with -O1 (same setup as before) compiles it to

_f:	cmp/pz	r4
	bf	.L3
	bf	.L4
	rts	
	mov	r4,r0
.L3:	rts	
	mov	#1,r0
.L4:	rts	
	mov	#3,r0

incorrectly eliminating the test for (i + j) != 0. The label .L4 returning 3 is evidently unreachable.

Comparing to 0 seems to be required. My previous example also compiles correctly if we check values[i] != 1 instead, which invalidates the loop/CFG theory.

Few commits changed the SH subtree since GCC 11.1.0; I should be able to bisect it. In the meantime I've updated the title.
Comment 5 Sébastien Michelland 2022-08-16 21:03:59 UTC
> Then this is a target specific issue until provided otherwise. mach stands for machine (target) specific pass.

That makes a lot of sense, thanks.

I found a much simpler example exhibiting the bug:

int f(int i, int j)
{
    if(i < 0) return 1;
    if(i + j) return 3;
    return i;
}

Latest HEAD with -O1 (same setup as before) compiles it to

_f:	cmp/pz	r4
	bf	.L3
	bf	.L4
	rts	
	mov	r4,r0
.L3:	rts	
	mov	#1,r0
.L4:	rts	
	mov	#3,r0

incorrectly eliminating the test for (i + j) != 0. The label .L4 returning 3 is evidently unreachable.

Comparing to 0 seems to be required. My previous example also compiles correctly if we check values[i] != 1 instead, which invalidates the loop/CFG theory.

Few commits changed the SH subtree since GCC 11.1.0; I should be able to bisect it. In the meantime I've updated the title.
Comment 6 Sébastien Michelland 2022-08-17 08:09:57 UTC
First bad commit is r12-1955-ga86b3453fc6e29cf0e19916b01c393652d838d56, though I don't know what path is taken from there to the incorrect rewrite.
Comment 7 Mikael Pettersson 2022-08-17 16:51:58 UTC
I can reproduce with gcc-12 as host compiler, but not with gcc-11 as host compiler.

I'm using the example in comment #4, and a recent HEAD of master (bac07a1da24) configured with --target=sh3eb-elf.

With "GCC: (GNU) 11.3.1 20220812" as host compiler, the cross-compiler compiles (-O1) the example to:

_f:
        cmp/pz  r4
        bf.s    .L3
        add     r4,r5
        tst     r5,r5
        bf      .L4
        rts
        mov     r4,r0
        .align 1
.L3:
        rts
        mov     #1,r0
        .align 1
.L4:
        rts
        mov     #3,r0

which looks ok to me (not knowing SH asm...).

With "GCC: (GNU) 12.1.1 20220812" (the 12.2.0 RC) as host compiler, the cross-compiler instead compiles the example to:

_f:
        cmp/pz  r4
        bf      .L3
        bf      .L4
        rts
        mov     r4,r0
        .align 1
.L3:
        rts
        mov     #1,r0
        .align 1
.L4:
        rts
        mov     #3,r0

which looks broken.
Comment 8 Andrew Pinski 2022-08-17 17:53:40 UTC
>I can reproduce with gcc-12 as host compiler, but not with gcc-11 as host compiler.

That is good to know. Now someone needs to figure out where the target compiler is being miscompiled ...
Comment 9 Mikael Pettersson 2022-08-18 20:12:57 UTC
# first bad commit: [3155d51bfd1de8b6c4645dcb2292248a8d7cc3c9] [PATCH] PR rtl-optimization/46235: Improved use of bt for bit tests on x86_64.

Starting with this commit, the host compiler (on x86_64-linux) miscompiles the gcc-13 based cross-compiler to sh3eb-elf.
Comment 10 Roger Sayle 2022-08-19 08:06:26 UTC
The command line option -mtune-ctrl=^use_bt should disable all of the patterns added in the patch identified as the first bad commit.
Comment 11 Richard Biener 2022-08-19 08:27:23 UTC
GCC 12.2 is being released, retargeting bugs to GCC 12.3.
Comment 12 Mikael Pettersson 2022-08-20 11:40:30 UTC
I tried compiling the gcc-13 cross compiler using the broken gcc-12 host compiler and -mtune-ctrl=^use_bt but that didn't help.

I then tried rebuilding the broken gcc-12 host compiler with the new splitters disabled, one by one. Disabling the "*bt<mode>_setcqi" one did unbreak the gcc-13 cross-compiler:

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 48532eb7ddf..0780ba992f3 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12830,7 +12830,7 @@
          (const_int 1)
          (zero_extend:SI (match_operand:QI 2 "register_operand"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_USE_BT && ix86_pre_reload_split ()"
+  "0 && TARGET_USE_BT && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (reg:CCC FLAGS_REG)
Comment 13 Jakub Jelinek 2022-11-24 13:13:49 UTC
(In reply to Mikael Pettersson from comment #12)
> I tried compiling the gcc-13 cross compiler using the broken gcc-12 host
> compiler and -mtune-ctrl=^use_bt but that didn't help.
> 
> I then tried rebuilding the broken gcc-12 host compiler with the new
> splitters disabled, one by one. Disabling the "*bt<mode>_setcqi" one did
> unbreak the gcc-13 cross-compiler:
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 48532eb7ddf..0780ba992f3 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -12830,7 +12830,7 @@
>           (const_int 1)
>           (zero_extend:SI (match_operand:QI 2 "register_operand"))))
>     (clobber (reg:CC FLAGS_REG))]
> -  "TARGET_USE_BT && ix86_pre_reload_split ()"
> +  "0 && TARGET_USE_BT && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
>    [(set (reg:CCC FLAGS_REG)

Ok, reproduced with last night's gcc trunk as the x86_64-linux system compiler
(with/without the above patch) and r12-8924-ga6b1f6126de5e4 as 12 branch for the cross-compiler.  The difference appears first in the
sh_treg_combine2 dump
-not a condition store
-other set found - aborting trace
+inverted condition store
+tracing ccreg
+set of ccreg not found
+
+cbranch trace summary:
etc.
And bisection points to insn-preds.o.
Comment 14 Jakub Jelinek 2022-11-24 14:38:14 UTC
Ugh, wasted time.
This is (well, was) a SH backend bug, fixed apparently on the trunk by Jeff
in r13-4118-ge214cab68cb34e77622b91113f7698cf137bbdd6
gcc/config/sh/sh_treg_combine.cc
contained bogus
// FIXME: Remove dependency on SH predicate function somehow.
extern int t_reg_operand (rtx, machine_mode);
extern int negt_reg_operand (rtx, machine_mode);
declarations, when the definitions of those functions actually were
bool t_reg_operand (rtx op, machine_mode mode ATTRIBUTE_UNUSED) { ... }
bool negt_reg_operand (rtx op, machine_mode mode ATTRIBUTE_UNUSED) { ... }
As x86_64 ABI on bool return only guarantees the state of the low 8 bits,
the callee could leave garbage in the upper 56 bits of the %rax register,
and then the caller would test 32 bits against zero as it was told it returns int.
Comment 15 Sébastien Michelland 2022-11-24 17:04:28 UTC
Thanks, turns out my bisected commit was related after all...

I can confirm that test cases from OP and #4 (with protocol from OP) are no longer broken for me on yesterday's master.
Comment 16 Sam James 2023-02-28 23:08:04 UTC
so just needs a 12 backport?
Comment 17 Richard Biener 2023-05-08 12:25:16 UTC
GCC 12.3 is being released, retargeting bugs to GCC 12.4.