Bug 100811 - Consider not omitting frame pointers by default on targets with many registers
Summary: Consider not omitting frame pointers by default on targets with many registers
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-28 11:34 UTC by Hadrien Grasland
Modified: 2023-05-26 13:47 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hadrien Grasland 2021-05-28 11:34:38 UTC
Since at least GCC 4 (Bugzilla's duplicate search points me to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13822), GCC has been omitting frame pointers by defaults when optimizations are enabled, unless the extra -fno-omit-frame-pointer flag is specified.

As far as I know, the rationale for doing this was that :

- On architectures with very few general purpose registers like 32-bit x86, strictly following frame pointer retention discipline has a prohibitive performance cost.
- Debuggers do not need frame pointers to do their job, as they can leverage DWARF or PDB debug information instead.

While these arguments are valid, I would like to make the case that frame pointers may be worth keeping by default on hardware architectures where this is not too expensive (like x86_64), for the purpose of making software performance analysis easier.

Unlike debuggers, sampling profilers like perf cannot afford the luxury of walking the process stack using DWARF any time a sample is taken, as that would take too much time and bias the measured performance profile. Instead, when using DWARF for stack unwinding purposes, they have to take stack samples and post-process them after the fact. Furthermore, since copying the full program stack on every sample would generate an unbearable volume of data, they usually can only afford to copy the top of the stack (upper 64KB at maximum for perf), which will lead to corrupted stack traces when application stacks get deep or there are lots of / large stack-allocated objects.

For all these reasons, DWARF-based stack unwinding is a somewhat unreliable technique in profiling, where it's really hard to get >90% of your profile's stack traces to be correctly reconstructed all the way to _start or _clone. The remaining misreconstructed stack traces will translate into profile bias (underestimated "children" overhead measurements), and thus performance analysis mistakes.

To make matters worse, DWARF-based unwinding is relatively complex, and not every useful runtime performance analysis tool supports it. For example, BPF-based tracing tools, which are nowadays becoming popular due to their highly appealing ability to instrument every kernel or user function on the fly, do not currently support DWARF-based stack unwinding, most likely because feeding the DWARF debug info into the kernel-based BPF program would either be prohibitively expensive, a security hole, or a source of recursive tracing incidents (tracing tool generates syscalls of the kind that it is tracing, creating an infinite loop).

Therefore, I think -fno-omit-frame-pointer should be the default on architectures where the price to pay is not too high (like x86_64), which should ensure that modern performance analysis tooling works on all popular Linux distributions without rebuilding the entire world. In this scheme, -fomit-frame-pointer would remain as a default option for targets where it is really needed (like legacy 32-bit x86), and as a specialist option for those cases where the extra ~1% of performance is really truly needed and worth its cost.

What do you think?
Comment 1 Jakub Jelinek 2021-05-28 11:40:33 UTC
This is a very bad idea, slowing down everything for the rare case that something needs to be profiled.  And profilers can afford to unwind too if needed.
Comment 2 Florian Weimer 2021-05-28 11:46:14 UTC
I expect that profilers will soon be able to use the shadow stack on x86, which will be more efficient and reliable than traversing frame pointers.
Comment 3 Andrew Pinski 2021-05-28 14:06:48 UTC
Also on say PowerPC, not omitting the frame pointer gives no benifit whats so ever really with respect to backtracing.
Comment 4 AK 2023-05-25 16:06:38 UTC
On AArch64 (typically mobile platforms) app developers typically would enable frame pointers by default because it helps with crash reporting.
Comment 5 Andrew Pinski 2023-05-25 16:30:08 UTC
(In reply to AK from comment #4)
> On AArch64 (typically mobile platforms) app developers typically would
> enable frame pointers by default because it helps with crash reporting.

s/AArch64 (typically mobile platforms)/phone\tablet/ 
Because aarch64 is also used for servers and network applicances too.
Comment 6 Jakub Jelinek 2023-05-25 16:35:47 UTC
I think aarch64 defaults to -fno-omit-frame-pointer anyway.
    /* Disable fomit-frame-pointer by default.  */
    { OPT_LEVELS_ALL, OPT_fomit_frame_pointer, NULL, 0 },
Comment 7 Florian Weimer 2023-05-25 16:40:17 UTC
(In reply to Jakub Jelinek from comment #6)
> I think aarch64 defaults to -fno-omit-frame-pointer anyway.
>     /* Disable fomit-frame-pointer by default.  */
>     { OPT_LEVELS_ALL, OPT_fomit_frame_pointer, NULL, 0 },

It's required by some (all?) AArch64 ABIs, certainly on GNU/Linux.
Comment 8 AK 2023-05-25 17:16:42 UTC
Should we enable frame-pointers by default for RISCV64 as well?
Comment 9 Jakub Jelinek 2023-05-25 17:19:27 UTC
Why?
It should be enabled by default only if it is effectively mandated by the ABI and/or doesn't affect performance at all (and is actually useful in functions that don't need it like functions with alloca/VLAs; see the PowerPC case).
Comment 10 Xi Ruoyao 2023-05-26 13:37:21 UTC
Frankly I've seen too much "slowing down everyone's system just for some debugging/profiling/whatever tools" thing.  So I'd say a clear "no".

You may argue 1% performance degradation is acceptable, but the change would be a bad start to justify other changes and at last we'll accumulate a 9.5% degradation with 10 such changes one day.

If DWARF unwinding does not work properly, try to fix it or revise the DWARF specification, instead of making every system slower.
Comment 11 Jakub Jelinek 2023-05-26 13:47:40 UTC
DWARF unwinding works properly, just in Linux kernel they decided they don't want it in the kernel (I think they had some non-perfect implementation in the past and it got removed).