Bug 51513 - Only partially optimizes away __builtin_unreachable switch default case
Summary: Only partially optimizes away __builtin_unreachable switch default case
Status: REOPENED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.2
: P3 enhancement
Target Milestone: ---
Assignee: Peter Bergner
URL: https://gcc.gnu.org/ml/gcc-patches/20...
Keywords: missed-optimization
: 60362 65922 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-12-12 10:53 UTC by Steinar H. Gunderson
Modified: 2020-04-17 18:31 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-04-17 00:00:00


Attachments
testcase (144 bytes, text/plain)
2020-04-07 23:09 UTC, Michał Mirosław
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steinar H. Gunderson 2011-12-12 10:53:44 UTC
Hi,

I have code that looks like this:

pannekake:~> cat test.c
void foo();
void bar();
void baz();

void func(int i)
{
	switch (i) {
		case 0: foo(); break;
		case 1: bar(); break;
		case 2: baz(); break;
		case 3: baz(); break;
		case 4: bar(); break;
		case 5: foo(); break;
		case 6: foo(); break;
		case 7: bar(); break;
		case 8: baz(); break;
		case 9: baz(); break;
		case 10: bar(); break;
		default: __builtin_unreachable(); break;
	}
}

Compiling this yields:

pannekake:~> gcc-4.6 -O2 -c test.c && objdump --disassemble test.o
             
test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <func>:
   0:	83 ff 0a             	cmp    $0xa,%edi
   3:	76 03                	jbe    8 <func+0x8>
   5:	0f 1f 00             	nopl   (%rax)
   8:	89 ff                	mov    %edi,%edi
   a:	31 c0                	xor    %eax,%eax
   c:	ff 24 fd 00 00 00 00 	jmpq   *0x0(,%rdi,8)
  13:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  18:	e9 00 00 00 00       	jmpq   1d <func+0x1d>
  1d:	0f 1f 00             	nopl   (%rax)
  20:	e9 00 00 00 00       	jmpq   25 <func+0x25>
  25:	0f 1f 00             	nopl   (%rax)
  28:	e9 00 00 00 00       	jmpq   2d <func+0x2d>

The first compare is, as you can see, unneeded; the code for the default case itself (a repz ret) has been optimized away due to the __builtin_unreachable(), but the compare and branch remains.

I've also seen it sometimes be able to remove the jump instruction itself, but not the compare.
Comment 1 Steinar H. Gunderson 2011-12-12 10:54:16 UTC
Forgot this:

pannekake:~> gcc-4.6 -v
Using built-in specs.
COLLECT_GCC=gcc-4.6
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.2-5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.6.2 (Debian 4.6.2-5)
Comment 2 Andrew Pinski 2011-12-15 01:24:30 UTC
Confirmed.
Comment 3 Emil L 2012-12-11 21:54:01 UTC
The optimization can either be done when emitting rtl in the expand_case function in stmt.c (and its callees) by first recognizing calls to __builtin_unreachable() at the default label, and then simply do not emit the compares and jumps to the default label. That task is quite easy.

Another option would be to add more general code in the later cfg optimization passes for rtl that removes all jumps (and corresponding compares) to basic blocks only containing barriers.

What do you think?

This optimization would be very interesting for interpreter implementators that use a switch statement to dispatch the next instruction, when they can guarantee that the default branch is never taken.
Comment 4 Steven Bosscher 2012-12-11 22:42:13 UTC
This should be solved by allowing labels in trivially_empty_bb_p.
Comment 5 Andrew Pinski 2014-02-28 01:37:18 UTC
*** Bug 60362 has been marked as a duplicate of this bug. ***
Comment 6 Andrew Pinski 2015-04-29 01:46:21 UTC
*** Bug 65922 has been marked as a duplicate of this bug. ***
Comment 7 Peter Bergner 2015-04-29 02:25:41 UTC
(In reply to Emil L from comment #3)
> This optimization would be very interesting for interpreter implementators
> that use a switch statement to dispatch the next instruction, when they can
> guarantee that the default branch is never taken.

I have actually hit the same issue with some code from PHP, so you're not too far off on your comment.  A reduced test case from PHP looks like:

void
foo (unsigned char *ptr, unsigned int cond)
{
  switch (cond)
    {
    case 0:
      return;
    case 1:
    case 2:
    case 3:
    case 4:
    case 6:
      *ptr += 1;
      return;
    case 5:
      *ptr += 2;
      return;
    default:
      __builtin_unreachable ();
      break;
    }
}

In this case, the undeleted branch had a wild label that pointed to nowhere.
Comment 8 Peter Bergner 2015-04-29 14:58:06 UTC
(In reply to Steven Bosscher from comment #4)
> This should be solved by allowing labels in trivially_empty_bb_p.

I tried the following, but it doesn't fix the problem.

--- cfgcleanup.c	(revision 222550)
+++ cfgcleanup.c	(working copy)
@@ -2647,7 +2647,8 @@ trivially_empty_bb_p (basic_block bb)
     {
       if (insn == BB_HEAD (bb))
 	return true;
-      if (!DEBUG_INSN_P (insn))
+      if (!(DEBUG_INSN_P (insn)
+	    || (LABEL_P (insn) && !LABEL_PRESERVE_P (insn))))
 	return false;
       insn = PREV_INSN (insn);
     }
Comment 9 Peter Bergner 2017-04-13 00:30:36 UTC
New patch submitted.
Comment 10 Peter Bergner 2017-05-10 16:50:30 UTC
Fixed on trunk with revision 247844.
Comment 11 Peter Bergner 2018-01-03 18:22:34 UTC
Closing as fixed.
Comment 12 Michał Mirosław 2020-04-07 23:09:03 UTC
Created attachment 48236 [details]
testcase

This bug reoccurred in gcc-8. gcc-7 and gcc-9+ seem not affected.

$ gcc-8 -O3 -S b.c -o -
[...]
foo:
.LFB0:
        .cfi_startproc
        cmpl    $1, %edi
        jne     .L10
        movslq  %esi, %rsi
        leaq    foz(%rip), %rax
        movl    (%rax,%rsi,4), %eax
        movl    %eax, (%rdx)
        xorl    %eax, %eax
        ret
        .p2align 4,,10
        .p2align 3
.L10:
        cmpl    $2, %edi
        jne     .L11
.L3:
        movslq  %esi, %rsi
        leaq    baz(%rip), %rax
        movl    (%rax,%rsi,4), %eax
        movl    %eax, (%rdx)
        xorl    %eax, %eax
        ret
        .p2align 4,,10
        .p2align 3
.L11:
        testl   %edi, %edi
        je      .L3
        .cfi_endproc
[...]

$ gcc-8 -O -S b.c -o -
[...]
foo:
.LFB0:
        .cfi_startproc
        cmpl    $1, %edi
        je      .L2
        cmpl    $2, %edi
        je      .L3
        testl   %edi, %edi
        jne     .L4
.L3:
        movslq  %esi, %rsi
        leaq    baz(%rip), %rax
        movl    (%rax,%rsi,4), %eax
        movl    %eax, (%rdx)
.L5:
        movl    $0, %eax
        ret
.L2:
        movslq  %esi, %rsi
        leaq    foz(%rip), %rax
        movl    (%rax,%rsi,4), %eax
        movl    %eax, (%rdx)
        jmp     .L5
.L4:
        .cfi_endproc
Comment 13 Peter Bergner 2020-04-17 18:31:42 UTC
Confirmed, I see the same issue with the test case in Comment #12 on powerpc64le-linux using gcc-8.  I also see the original problem with the original test case using gcc-8 as well.