GCC 9.0.0 (20180924) generates these thunks on x86-64: __x86_indirect_thunk_rdi: .LFB1: .cfi_startproc call .LIND1 .LIND0: pause lfence jmp .LIND0 .LIND1: mov %rdi, (%rsp) ret .cfi_endproc .LFE1: I don't think the CFI is correct. At the ret instruction, the CFI indicates that the return address is at the top of the stack. The unwinder will use this return address and subtract one because it's a non-signal handler frame. But the resulting address is located before the start of the function, so it will locate an incorrect FDE based on it. Indeed I see this when si-stepping through the execution with GDB: (gdb) disas Dump of assembler code for function __x86_indirect_thunk_rdi: 0x00000000004004a5 <+0>: callq 0x4004b1 <__x86_indirect_thunk_rdi+12> 0x00000000004004aa <+5>: pause 0x00000000004004ac <+7>: lfence 0x00000000004004af <+10>: jmp 0x4004aa <__x86_indirect_thunk_rdi+5> 0x00000000004004b1 <+12>: mov %rdi,(%rsp) => 0x00000000004004b5 <+16>: retq 0x00000000004004b6 <+17>: nopw %cs:0x0(%rax,%rax,1) (gdb) bt #0 0x00000000004004b5 in __x86_indirect_thunk_rdi () #1 0x0000000000400490 in frame_dummy () at /tmp/cfi.c:16 #2 0x000000000040038e in main () at /tmp/cfi.c:16 End of assembler dump. (gdb) print f2 $1 = {int (void)} 0x400490 <f2> Note the “frame_dummy” instead of “f2” in the backtrace. Test program: __attribute__ ((weak)) int f1 (int (*f2) (void)) { return f2 (); } int f2 (void) { } int main (void) { f1 (f2); } We had a bit of an internal debate whether it's actually possible to produce correct CFI for this. I think we can reflect the stack pointer adjustment after the thunk-internal call in the CFI, so that the unwinder continues to see the original caller of the thunk. Due to the address decrement, this needs to happen for the jmp instruction, not after the .LIND1 label. As an alternative, it would be possible to error out when -mindirect-branch=thunk is used with -fasynchronous-unwind-tables, but since the latter is the default, this would be a bit harsh.
I think we want to emit .cfi_def_cfa_offset 16 after .LIND1: label, because the call acts as a pushm so the caller (of the thunk) is no longer at $sp+8 after the call, but at $sp+16.
Created attachment 44744 [details] gcc9-pr87414.patch Untested fix.
Author: jakub Date: Wed Sep 26 17:00:49 2018 New Revision: 264651 URL: https://gcc.gnu.org/viewcvs?rev=264651&root=gcc&view=rev Log: PR target/87414 * config/i386/i386.c: Include debug.h and dwarf2out.h. (output_indirect_thunk): Emit DW_CFA_def_cfa_offset after the call. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c
Author: jakub Date: Thu Oct 11 17:40:31 2018 New Revision: 265048 URL: https://gcc.gnu.org/viewcvs?rev=265048&root=gcc&view=rev Log: Backported from mainline 2018-09-26 Jakub Jelinek <jakub@redhat.com> PR target/87414 * config/i386/i386.c: Include debug.h and dwarf2out.h. (output_indirect_thunk): Emit DW_CFA_def_cfa_offset after the call. Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/i386/i386.c
Fixed for 8.3+.