This is the mail archive of the
mailing list for the GCC project.
Re: [RFC][AArch64] function prologue analyzer in linux kernel
- From: AKASHI Takahiro <takahiro dot akashi at linaro dot org>
- To: "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>, Will Deacon <will dot deacon at arm dot com>
- Cc: GCC Development <gcc at gcc dot gnu dot org>
- Date: Fri, 8 Jan 2016 14:36:32 +0900
- Subject: Re: [RFC][AArch64] function prologue analyzer in linux kernel
- Authentication-results: sourceware.org; auth=none
- References: <567BA582 dot 4060707 at linaro dot org> <20160107142247 dot GF23028 at arm dot com> <568E7C8C dot 8000009 at arm dot com>
On 01/07/2016 11:56 PM, Richard Earnshaw (lists) wrote:
On 07/01/16 14:22, Will Deacon wrote:
On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote:
I'm the author of ftrace support on arm64(aarch64) linux. As part of
ftrace, we can utilize "stack tracer" which reports the maximum usage
of kernel stack:
We're probably missing some important background here -- I suspect most
of the GCC folk are wondering wtf this is and what it's doing on their
It's partly my fault, since I asked you to run this by the compiler guys,
but that's because I have concerns on the approach. See below.
# cat /sys/kernel/debug/tracing/stack_max_size
# cat /sys/kernel/debug/tracing/stack_trace
Depth Size Location (49 entries)
----- ---- --------
0) 4088 16 __local_bh_enable_ip+0x18/0xd8
1) 4072 32 _raw_read_unlock_bh+0x38/0x48
2) 4040 32 xs_udp_write_space+0x44/0x50
3) 4008 32 sock_wfree+0x88/0x90
4) 3976 32 skb_release_head_state+0x70/0xa0
44) 808 32 load_elf_binary+0x29c/0x10d0
45) 776 224 search_binary_handler+0xbc/0x208
46) 552 96 do_execveat_common.isra.15+0x4e4/0x690
47) 456 112 SyS_execve+0x4c/0x60
48) 344 344 el0_svc_naked+0x24/0x28
Here, "Depth" (and hence "Size") is determined, after scanning a stack,
by saved fp pointer (more precisely + 0x10) in a stack frame instead
of (not saved) stack pointer. (Please note that arm64 kernel is always
compiled with -fno-omit-frame-pointer.)
As fp is updated after branching into a function, and allocates not only
a function's stack frame but also callee's local variables, using this
saved value of fp as "Depth", or sp of a caller function, is not
appropriate for calculating a stack size of a function.
So I'd like to introduce a function prologue analyzer to determine
a size allocated by a function's prologue and deduce it from "Depth".
My implementation of this analyzer has been submitted to
linux-arm-kernel mailing list.
I borrowed some ideas from gdb's analyzer, especially a loop of
instruction decoding as well as stop of decoding at exiting a basic block,
but implemented my own simplified one because gdb version seems to do
a bit more than what we expect here.
Anyhow, since it is somewhat heuristic (and may not be maintainable for
a long term), could you review it from a broader viewpoint of toolchain,
 aarch64_analyze_prologue() in gdb/aarch64-tdep.c
My main issue with this is that we cannot rely on the frame layout
generated by the compiler and there's little point in asking for
commitment here. Therefore, the heuristics will need updating as and
when we identify new frames that we can't handle. That's pretty fragile
and puts us on the back foot when faced with newer compilers. This might
be sustainable if we don't expect to encounter much variation, but even
that would require some sort of "buy-in" from the various toolchain
GCC already has an option (-fstack-usage) to determine the stack usage
on a per-function basis and produce a report at build time. Why can't
we use that to provide the information we need, rather than attempt to
compute it at runtime based on your analyser?
If -fstack-usage is not sufficient, understanding why might allow us to
propose a better option.
Can you not use the dwarf frame unwind data? That's always sufficient
to recover the CFA (canonical frame address - the value in SP when
executing the first instruction in a function). It seems to me it's
unlikely you're going to need something that's an exceedingly high
Thank you for your comment.
Yeah, but we need some utility routines to handle unwind data(.debug_frame).
In fact, some guy has already attempted to merge (part of) libunwind into
the kernel, but it was rejected by the kernel community (including Linus
if I correctly remember). It seems that they thought the code was still buggy.
That is one of reasons that I wanted to implement my own analyzer.