106609 – [12 Regression] sh3eb-elf cross compiler is being miscompiled since r12-1525-g3155d51bfd1de8b6c4645

Bug 106609 - [12 Regression] sh3eb-elf cross compiler is being miscompiled since r12-1525-g3155d51bfd1de8b6c4645

Summary: [12 Regression] sh3eb-elf cross compiler is being miscompiled since r12-1525-...

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	12.1.0

Importance:	P4 blocker
Target Milestone:	12.5
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2022-08-13 17:31 UTC by Sébastien Michelland
Modified:	2024-06-20 09:09 UTC (History)
CC List:	6 users (show)

See Also:	46235
Host:
Target:	x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:	2022-11-24 00:00:00

Attachments
Minimal test case (150 bytes, text/plain) 2022-08-13 17:31 UTC, Sébastien Michelland	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sébastien Michelland 2022-08-13 17:31:54 UTC

Created attachment 53452 [details]
Minimal test case

GCC 12.1.0 generates incorrect code for loops that call into noreturn functions starting at -O1. For example,

__attribute__((noreturn)) void g(void);
void f(int *values)
{
    for(int i = 0; i < 8; i++)
        if(values[i] != 0) g();
}

gets compiled into

	sts.l	pr,@-r15
	mov	#8,r1
.L3:	bt.s	.L2
	dt	r1
	mov.l	.L5,r1 # _g
	jsr	@r1
	nop
.L2:	bf	.L3
	lds.l	@r15+,pr
	rts	
	nop

which has the obvious issue of not reading its input, and also using the T bit through bt.s before doing any test.

I suppose this is an overly aggressive loop/CFG optimization, but I was not able to trim it down as unfolding -O1 into the list of optimizations provided by -Q --help=optimizers produced a completely different program. The bug still occurs with -fno-aggressive-loop-optimizations at least.

With -funroll-all-loops, instead of 8 mov.l/tst/bf I just get 8 bf which suggests that the test is wrongly eliminated causing the load to become dead.

Found in:
  GCC 12.1.0
  Commit 4991e2092 of today's master
Not found in:
  GCC 11.1.0
Configured with:
  ../gcc-12.1.0/configure --prefix=$PREFIX --target=sh3eb-elf --enable-languages=c --without-headers
Reproduction instructions:
  $PREFIX/bin/sh3eb-elf-gcc -S noreturn-loop-bug.c -o - -O1

Comment 1 Richard Biener 2022-08-15 08:28:29 UTC

.L3:	bt.s	.L2
	dt	r1

are there any delay slots in SH?  Does -fno-schedule-insns -fno-schedule-insns2 help?

Comment 2 Sébastien Michelland 2022-08-15 09:47:06 UTC

Yes there are delay slots for all branches except bt and bf, so here bt.s, jsr and rts all have one. (-fno-delayed-branches avoids them but that doesn't affect the bad optimization in this case.)

Adding -fno-schedule-insns -fno-schedule-insns2 doesn't help; the test and load are still eliminated. In fact, I negated the entirety of -Q --help=optimizers and still got an incorrect output.

It seems that the problem lies further downstream. I dumped various RTL stages and -fdump-rtl-mach is the first stage where the comparison to 0 is gone, if the order in the manual is anything to go by. Since I have both functional and non-functional outputs from different RTL stages, I'm switching the component to rtl-optimization.

Comment 3 Andrew Pinski 2022-08-15 15:52:56 UTC

> -fdump-rtl-mach is the first stage where the comparison to 0 is gone

Then this is a target specific issue until provided otherwise. mach stands for machine (target) specific pass.

Comment 4 Sébastien Michelland 2022-08-16 21:03:43 UTC

> Then this is a target specific issue until provided otherwise. mach stands for machine (target) specific pass.

That makes a lot of sense, thanks.

I found a much simpler example exhibiting the bug:

int f(int i, int j)
{
    if(i < 0) return 1;
    if(i + j) return 3;
    return i;
}

Latest HEAD with -O1 (same setup as before) compiles it to

_f:	cmp/pz	r4
	bf	.L3
	bf	.L4
	rts	
	mov	r4,r0
.L3:	rts	
	mov	#1,r0
.L4:	rts	
	mov	#3,r0

incorrectly eliminating the test for (i + j) != 0. The label .L4 returning 3 is evidently unreachable.

Comparing to 0 seems to be required. My previous example also compiles correctly if we check values[i] != 1 instead, which invalidates the loop/CFG theory.

Few commits changed the SH subtree since GCC 11.1.0; I should be able to bisect it. In the meantime I've updated the title.

Comment 5 Sébastien Michelland 2022-08-16 21:03:59 UTC

> Then this is a target specific issue until provided otherwise. mach stands for machine (target) specific pass.

That makes a lot of sense, thanks.

I found a much simpler example exhibiting the bug:

int f(int i, int j)
{
    if(i < 0) return 1;
    if(i + j) return 3;
    return i;
}

Latest HEAD with -O1 (same setup as before) compiles it to

_f:	cmp/pz	r4
	bf	.L3
	bf	.L4
	rts	
	mov	r4,r0
.L3:	rts	
	mov	#1,r0
.L4:	rts	
	mov	#3,r0

incorrectly eliminating the test for (i + j) != 0. The label .L4 returning 3 is evidently unreachable.

Comparing to 0 seems to be required. My previous example also compiles correctly if we check values[i] != 1 instead, which invalidates the loop/CFG theory.

Few commits changed the SH subtree since GCC 11.1.0; I should be able to bisect it. In the meantime I've updated the title.

Comment 6 Sébastien Michelland 2022-08-17 08:09:57 UTC

First bad commit is r12-1955-ga86b3453fc6e29cf0e19916b01c393652d838d56, though I don't know what path is taken from there to the incorrect rewrite.

Comment 7 Mikael Pettersson 2022-08-17 16:51:58 UTC

I can reproduce with gcc-12 as host compiler, but not with gcc-11 as host compiler.

I'm using the example in comment #4, and a recent HEAD of master (bac07a1da24) configured with --target=sh3eb-elf.

With "GCC: (GNU) 11.3.1 20220812" as host compiler, the cross-compiler compiles (-O1) the example to:

_f:
        cmp/pz  r4
        bf.s    .L3
        add     r4,r5
        tst     r5,r5
        bf      .L4
        rts
        mov     r4,r0
        .align 1
.L3:
        rts
        mov     #1,r0
        .align 1
.L4:
        rts
        mov     #3,r0

which looks ok to me (not knowing SH asm...).

With "GCC: (GNU) 12.1.1 20220812" (the 12.2.0 RC) as host compiler, the cross-compiler instead compiles the example to:

_f:
        cmp/pz  r4
        bf      .L3
        bf      .L4
        rts
        mov     r4,r0
        .align 1
.L3:
        rts
        mov     #1,r0
        .align 1
.L4:
        rts
        mov     #3,r0

which looks broken.

Comment 8 Andrew Pinski 2022-08-17 17:53:40 UTC

>I can reproduce with gcc-12 as host compiler, but not with gcc-11 as host compiler.

That is good to know. Now someone needs to figure out where the target compiler is being miscompiled ...

Comment 9 Mikael Pettersson 2022-08-18 20:12:57 UTC

# first bad commit: [3155d51bfd1de8b6c4645dcb2292248a8d7cc3c9] [PATCH] PR rtl-optimization/46235: Improved use of bt for bit tests on x86_64.

Starting with this commit, the host compiler (on x86_64-linux) miscompiles the gcc-13 based cross-compiler to sh3eb-elf.

Comment 10 Roger Sayle 2022-08-19 08:06:26 UTC

The command line option -mtune-ctrl=^use_bt should disable all of the patterns added in the patch identified as the first bad commit.

Comment 11 Richard Biener 2022-08-19 08:27:23 UTC

GCC 12.2 is being released, retargeting bugs to GCC 12.3.

Comment 12 Mikael Pettersson 2022-08-20 11:40:30 UTC

I tried compiling the gcc-13 cross compiler using the broken gcc-12 host compiler and -mtune-ctrl=^use_bt but that didn't help.

I then tried rebuilding the broken gcc-12 host compiler with the new splitters disabled, one by one. Disabling the "*bt<mode>_setcqi" one did unbreak the gcc-13 cross-compiler:

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 48532eb7ddf..0780ba992f3 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12830,7 +12830,7 @@
          (const_int 1)
          (zero_extend:SI (match_operand:QI 2 "register_operand"))))
    (clobber (reg:CC FLAGS_REG))]
-  "TARGET_USE_BT && ix86_pre_reload_split ()"
+  "0 && TARGET_USE_BT && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(set (reg:CCC FLAGS_REG)

Comment 13 Jakub Jelinek 2022-11-24 13:13:49 UTC

(In reply to Mikael Pettersson from comment #12)
> I tried compiling the gcc-13 cross compiler using the broken gcc-12 host
> compiler and -mtune-ctrl=^use_bt but that didn't help.
> 
> I then tried rebuilding the broken gcc-12 host compiler with the new
> splitters disabled, one by one. Disabling the "*bt<mode>_setcqi" one did
> unbreak the gcc-13 cross-compiler:
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 48532eb7ddf..0780ba992f3 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -12830,7 +12830,7 @@
>           (const_int 1)
>           (zero_extend:SI (match_operand:QI 2 "register_operand"))))
>     (clobber (reg:CC FLAGS_REG))]
> -  "TARGET_USE_BT && ix86_pre_reload_split ()"
> +  "0 && TARGET_USE_BT && ix86_pre_reload_split ()"
>    "#"
>    "&& 1"
>    [(set (reg:CCC FLAGS_REG)

Ok, reproduced with last night's gcc trunk as the x86_64-linux system compiler
(with/without the above patch) and r12-8924-ga6b1f6126de5e4 as 12 branch for the cross-compiler.  The difference appears first in the
sh_treg_combine2 dump
-not a condition store
-other set found - aborting trace
+inverted condition store
+tracing ccreg
+set of ccreg not found
+
+cbranch trace summary:
etc.
And bisection points to insn-preds.o.

Comment 14 Jakub Jelinek 2022-11-24 14:38:14 UTC

Ugh, wasted time.
This is (well, was) a SH backend bug, fixed apparently on the trunk by Jeff
in r13-4118-ge214cab68cb34e77622b91113f7698cf137bbdd6
gcc/config/sh/sh_treg_combine.cc
contained bogus
// FIXME: Remove dependency on SH predicate function somehow.
extern int t_reg_operand (rtx, machine_mode);
extern int negt_reg_operand (rtx, machine_mode);
declarations, when the definitions of those functions actually were
bool t_reg_operand (rtx op, machine_mode mode ATTRIBUTE_UNUSED) { ... }
bool negt_reg_operand (rtx op, machine_mode mode ATTRIBUTE_UNUSED) { ... }
As x86_64 ABI on bool return only guarantees the state of the low 8 bits,
the callee could leave garbage in the upper 56 bits of the %rax register,
and then the caller would test 32 bits against zero as it was told it returns int.

Comment 15 Sébastien Michelland 2022-11-24 17:04:28 UTC

Thanks, turns out my bisected commit was related after all...

I can confirm that test cases from OP and #4 (with protocol from OP) are no longer broken for me on yesterday's master.

Comment 16 Sam James 2023-02-28 23:08:04 UTC

so just needs a 12 backport?

Comment 17 Richard Biener 2023-05-08 12:25:16 UTC

GCC 12.3 is being released, retargeting bugs to GCC 12.4.

Comment 18 Richard Biener 2024-06-20 09:09:42 UTC

GCC 12.4 is being released, retargeting bugs to GCC 12.5.