Bug 99151 - Missed optimization: Superfluous stack frame and code with noreturn or __builtin_unreachable()
Summary: Missed optimization: Superfluous stack frame and code with noreturn or __buil...
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2021-02-18 14:45 UTC by Sebastian Huber
Modified: 2021-03-01 13:21 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Huber 2021-02-18 14:45:57 UTC
The following test code:

_Noreturn void r(void);
void u(void);

void g(void)
{
        r();
}


void f(void)
{
        u();
        __builtin_unreachable();
}

Produces code the following code on a sample set of architectures. There should be no stack frame set up. There should be no instructions after the no-return function calls (for example sparc).

sparc-rtems6-gcc -O2 -o - -S unreachable.c 
        .file   "unreachable.c"
        .section        ".text"
        .align 4
        .global g
        .type   g, #function
        .proc   020
g:
        save    %sp, -96, %sp
        call    r, 0
         nop
        nop
        .size   g, .-g
        .align 4
        .global f
        .type   f, #function
        .proc   020
f:
        save    %sp, -96, %sp
        call    u, 0
         nop
        nop
        .size   f, .-f
        .ident  "GCC: (GNU) 10.2.1 20210205 (RTEMS 6, RSB 61dcadee0825867ebe51f9f367430ef75b8fe9c0, Newlib d4a756f)"

arm-rtems6-gcc -O2 -o - -S unreachable.c 
        .cpu arm7tdmi
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 2
        .eabi_attribute 30, 2
        .eabi_attribute 34, 0
        .eabi_attribute 18, 4
        .file   "unreachable.c"
        .text
        .align  2
        .global g
        .arch armv4t
        .syntax unified
        .arm
        .fpu softvfp
        .type   g, %function
g:
        @ Function supports interworking.
        @ Volatile: function does not return.
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        push    {r4, lr}
        bl      r
        .size   g, .-g
        .align  2
        .global f
        .syntax unified
        .arm
        .fpu softvfp
        .type   f, %function
f:
        @ Function supports interworking.
        @ Volatile: function does not return.
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        push    {r4, lr}
        bl      u
        .size   f, .-f
        .ident  "GCC: (GNU) 10.2.1 20210205 (RTEMS 6, RSB 61dcadee0825867ebe51f9f367430ef75b8fe9c0, Newlib d4a756f)"
powerpc-rtems6-gcc -O2 -o - -S unreachable.c 
        .file   "unreachable.c"
        .machine ppc
        .section        ".text"
        .align 2
        .globl g
        .type   g, @function
g:
.LFB0:
        .cfi_startproc
        stwu 1,-16(1)
        .cfi_def_cfa_offset 16
        mflr 0
        stw 0,20(1)
        .cfi_offset 65, 4
        bl r
        .cfi_endproc
.LFE0:
        .size   g,.-g
        .align 2
        .globl f
        .type   f, @function
f:
.LFB1:
        .cfi_startproc
        stwu 1,-16(1)
        .cfi_def_cfa_offset 16
        mflr 0
        stw 0,20(1)
        .cfi_offset 65, 4
        bl u
        .cfi_endproc
.LFE1:
        .size   f,.-f
        .ident  "GCC: (GNU) 10.2.1 20210205 (RTEMS 6, RSB 61dcadee0825867ebe51f9f367430ef75b8fe9c0, Newlib d4a756f)"
        .section        .note.GNU-stack,"",@progbits
Comment 1 Jakub Jelinek 2021-02-18 14:52:36 UTC
This is done intentionally, so that one gets e.g. usable backtraces from abort.
Comment 2 Sebastian Huber 2021-02-18 14:56:08 UTC
(In reply to Jakub Jelinek from comment #1)
> This is done intentionally, so that one gets e.g. usable backtraces from
> abort.

Ok, the stack frame could be a feature.

The extra nop on SPARC hurts a bit in my case since I have to show that my program has no unreachable code and here I get an unreachable nop:

f:
        save    %sp, -96, %sp
        call    u, 0
         nop
        nop <--------------- unreachable
Comment 3 Andrew Pinski 2021-02-18 19:15:37 UTC
Is the nop due to alignment?
Comment 4 Sebastian Huber 2021-02-19 06:46:58 UTC
(In reply to Andrew Pinski from comment #3)
> Is the nop due to alignment?

With -g and -coverage I get this code:

sparc-rtems7-gcc -O2 -o - -S unreachable.c -fverbose-asm -g -coverage
        .file   "unreachable.c"
! GNU C17 (GCC) version 11.0.0 20210205 (RTEMS 7, RSB 61dcadee0825867ebe51f9f367430ef75b8fe9c0, Newlib d4a756f) (sparc-rtems7)
!       compiled by GNU C version 11.0.0 20201130 (experimental) [revision b46314c78061a5156bac44a317c87d32b00d4295], GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

! GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
! options passed: -mcpu=v7 -g -O2 -fprofile-arcs -ftest-coverage
        .section        ".text"
.LLtext0:
        .cfi_sections   .debug_frame
        .align 4
        .global g
        .type   g, #function
        .proc   020
g:
.LLFB0:
        .file 1 "unreachable.c"
        .loc 1 5 1 view -0
        .cfi_startproc
        save    %sp, -96, %sp   !
        .cfi_window_save
        .cfi_register 15, 31
        .cfi_def_cfa_register 30
        sethi   %hi(__gcov0.g), %g1     !, tmp112
        ldd     [%g1+%lo(__gcov0.g)], %i4       ! __gcov0.g[0], __gcov0.g[0]
        addcc   %i5, 1, %g3     ! __gcov0.g[0],, tmp115
        addx    %i4, 0, %g2     ! __gcov0.g[0],,
        .loc 1 6 9 view .LLVU1
        call    r, 0    !,
.LLVL0:
         std    %g2, [%g1+%lo(__gcov0.g)]       ! tmp115, __gcov0.g[0]
        nop

In the the no-return function case, this nop has no line debug information and there is no profiling counter increment.

        .cfi_endproc
.LLFE0:
        .size   g, .-g
        .align 4
        .global f
        .type   f, #function
        .proc   020
f:
.LLFB1:
        .loc 1 11 1 view -0
        .cfi_startproc
        save    %sp, -96, %sp   !
        .cfi_window_save
        .cfi_register 15, 31
        .cfi_def_cfa_register 30
        sethi   %hi(__gcov0.f), %g1     !, tmp114
        ldd     [%g1+%lo(__gcov0.f)], %i2       ! __gcov0.f[0], __gcov0.f[0]
        addcc   %i3, 1, %g3     ! __gcov0.f[0],, tmp117
        addx    %i2, 0, %g2     ! __gcov0.f[0],,
        or      %g1, %lo(__gcov0.f), %i5        ! tmp114,, tmp113
        .loc 1 12 9 view .LLVU3
        call    u, 0    !,
.LLVL1:
         std    %g2, [%g1+%lo(__gcov0.f)]       ! tmp117, __gcov0.f[0]
        ldd     [%i5+8], %i2    ! __gcov0.f[1], __gcov0.f[1]
        addcc   %i3, 1, %g3     ! __gcov0.f[1],, tmp123
        addx    %i2, 0, %g2     ! __gcov0.f[1],,
        std     %g2, [%i5+8]    ! tmp123, __gcov0.f[1]
        .loc 1 13 9 view .LLVU4
        .cfi_endproc
.LLFE1:
        .size   f, .-f

In the __builtin_unreachable() case we don't have a nop, but a profiling counter increment is there.

clang 9 generates this code:

clang -O2 -o - -S unreachable.c -fverbose-asm -g -coverage --target=sparc
        .text
        .file   "unreachable.c"
        .globl  g                       ! -- Begin function g
        .p2align        2
        .type   g,@function
g:                                      ! @g
.Lfunc_begin0:
        .file   1 "/home/EB/sebastian_h/src/rtems/unreachable.c"
        .loc    1 5 0                   ! unreachable.c:5:0
        .cfi_startproc
! %bb.0:
        save %sp, -96, %sp
        .cfi_def_cfa_register %fp
        .cfi_window_save
        .cfi_register 15, 31
.Ltmp0:
        .loc    1 6 9 prologue_end      ! unreachable.c:6:9
        sethi %hi(__llvm_gcov_ctr), %i0
        ldd [%i0+%lo(__llvm_gcov_ctr)], %i2
        addcc %i3, 1, %i5
        addxcc %i2, 0, %i4
        call r
        std %i4, [%i0+%lo(__llvm_gcov_ctr)]
.Ltmp1:
.Lfunc_end0:
        .size   g, .Lfunc_end0-g
        .cfi_endproc
                                        ! -- End function
        .globl  f                       ! -- Begin function f
        .p2align        2
        .type   f,@function
f:                                      ! @f
.Lfunc_begin1:
        .loc    1 11 0                  ! unreachable.c:11:0
        .cfi_startproc
! %bb.0:
        save %sp, -96, %sp
        .cfi_def_cfa_register %fp
        .cfi_window_save
        .cfi_register 15, 31
.Ltmp2:
        .loc    1 12 9 prologue_end     ! unreachable.c:12:9
        sethi %hi(__llvm_gcov_ctr.1), %i0
        ldd [%i0+%lo(__llvm_gcov_ctr.1)], %i2
        addcc %i3, 1, %i5
        addxcc %i2, 0, %i4
        call u
        std %i4, [%i0+%lo(__llvm_gcov_ctr.1)]
.Ltmp3:
.Lfunc_end1:
        .size   f, .Lfunc_end1-f
        .cfi_endproc
                                        ! -- End function

There are no nops and no unreachable profiling counter increments.
Comment 5 Eric Botcazou 2021-03-01 11:17:05 UTC
static void
sparc_asm_function_epilogue (FILE *file)
{
  /* If the last two instructions of a function are "call foo; dslot;"
     the return address might point to the first instruction in the next
     function and we have to output a dummy nop for the sake of sane
     backtraces in such cases.  This is pointless for sibling calls since
     the return address is explicitly adjusted.  */
Comment 6 Sebastian Huber 2021-03-01 12:36:40 UTC
(In reply to Eric Botcazou from comment #5)
> static void
> sparc_asm_function_epilogue (FILE *file)
> {
>   /* If the last two instructions of a function are "call foo; dslot;"
>      the return address might point to the first instruction in the next
>      function and we have to output a dummy nop for the sake of sane
>      backtraces in such cases.  This is pointless for sibling calls since
>      the return address is explicitly adjusted.  */

This nop behaviour could be a bit inconsistent across architectures. For example, arm and powerpc don't generate a nop here.

I still think that the profiling counter increment in the __builtin_unreachable() path is a bug.
Comment 7 Eric Botcazou 2021-03-01 12:53:03 UTC
> This nop behaviour could be a bit inconsistent across architectures. For
> example, arm and powerpc don't generate a nop here.

Well, it's low-level trickery so architecture-dependent by definition.

> I still think that the profiling counter increment in the
> __builtin_unreachable() path is a bug.

How so?  I only see a missed optimization, but with -fprofile-arcs -ftest-coverage you're splitting hairs IMO.
Comment 8 Sebastian Huber 2021-03-01 13:04:39 UTC
(In reply to Eric Botcazou from comment #7)
> > I still think that the profiling counter increment in the
> > __builtin_unreachable() path is a bug.
> 
> How so?  I only see a missed optimization, but with -fprofile-arcs
> -ftest-coverage you're splitting hairs IMO.

The __builtin_unreachable() explicitly mentions the use case with a function which doesn't return in the documentation:

https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

I would expect from the compiler that it generates then similar code. Having a profiling counter increment in one case and not in the other is not really a nice behaviour.
Comment 9 Jakub Jelinek 2021-03-01 13:21:35 UTC
We certainly don't claim it is the exact same thing as using the noreturn attribute, it is not.
We could in profile.c ignore edges that lead to basic blocks that start with __builtin_unreachable () call as the first non-debug stmt after labels if it was really important, but I'm not convinced it is.