This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] i386: Don't use frame pointer without stack access


On Mon, Aug 7, 2017 at 7:06 AM, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Mon, 7 Aug 2017, Michael Matz wrote:
>> > I am looking for a run-time test which breaks unwinder.
>>
>> I don't have one handy.  Idea: make two threads, one endlessly looping in
>> the "frame-less" function, the other causing a signal to the first thread,
>> and the signal handler checking that unwinding up to caller of
>> frame_less() is possible via %[er]bp chaining.
>
> You'd probably have to arrange frame_less modify %rbp, otherwise unwinding
> might "appear to work" by virtue of %rbp being valid for the outer frame.
>
> I think one specific, real-life use case that may be potentially hurt by
> this change is using linux-perf with backtrace recording, for programs with
> hot functions that don't otherwise access the stack (which is plausible for
> leaf functions with hot loops).
>
> Alexander

This code is very silly with very little benefit:

[hjl@gnu-6 tmp]$ cat x.c
extern void bar (void);

void
foo (void)
{
  bar ();
}
[hjl@gnu-6 tmp]$ gcc -fno-omit-frame-pointer x.c -S -O2
[hjl@gnu-6 tmp]$ cat x.s
.file "x.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
popq %rbp
.cfi_def_cfa 7, 8
jmp bar
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (GNU) 7.1.1 20170709 (Red Hat 7.1.1-4)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-6 tmp]$

When another compiler does this optimization, applications won't
expect

pushq %rbp
movq %rsp, %rbp
popq %rbp
jmp bar

When Linux/x86-64 kernel is compiled with -fno-omit-frame-pointer.
this optimization removes more than 730

pushq %rbp
movq %rsp, %rbp
popq %rbp

Can we apply this optimization when function body has less than 6
instructions, similar to ix86_pad_short_function?

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]