This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][AArch64] function prologue analyzer in linux kernel


Akashi,

On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote:
> I'm the author of ftrace support on arm64(aarch64) linux. As part of
> ftrace, we can utilize "stack tracer" which reports the maximum usage
> of kernel stack:

We're probably missing some important background here -- I suspect most
of the GCC folk are wondering wtf this is and what it's doing on their
list ;)

It's partly my fault, since I asked you to run this by the compiler guys,
but that's because I have concerns on the approach. See below.

> ---8<---
> # cat /sys/kernel/debug/tracing/stack_max_size
> 4088
> # cat /sys/kernel/debug/tracing/stack_trace
>         Depth    Size   Location    (49 entries)
>         -----    ----   --------
>   0)     4088      16   __local_bh_enable_ip+0x18/0xd8
>   1)     4072      32   _raw_read_unlock_bh+0x38/0x48
>   2)     4040      32   xs_udp_write_space+0x44/0x50
>   3)     4008      32   sock_wfree+0x88/0x90
>   4)     3976      32   skb_release_head_state+0x70/0xa0
>  [snip]
>  44)      808      32   load_elf_binary+0x29c/0x10d0
>  45)      776     224   search_binary_handler+0xbc/0x208
>  46)      552      96   do_execveat_common.isra.15+0x4e4/0x690
>  47)      456     112   SyS_execve+0x4c/0x60
>  48)      344     344   el0_svc_naked+0x24/0x28
> --->8---
> 
> Here, "Depth" (and hence "Size") is determined, after scanning a stack,
> by saved fp pointer (more precisely + 0x10) in a stack frame instead
> of (not saved) stack pointer. (Please note that arm64 kernel is always
> compiled with -fno-omit-frame-pointer.)
> 
> As fp is updated after branching into a function, and allocates not only
> a function's stack frame but also callee's local variables, using this
> saved value of fp as "Depth", or sp of a caller function, is not
> appropriate for calculating a stack size of a function.
> 
> So I'd like to introduce a function prologue analyzer to determine
> a size allocated by a function's prologue and deduce it from "Depth".
> My implementation of this analyzer has been submitted to
> linux-arm-kernel mailing list[1].
> I borrowed some ideas from gdb's analyzer[2], especially a loop of
> instruction decoding as well as stop of decoding at exiting a basic block,
> but implemented my own simplified one because gdb version seems to do
> a bit more than what we expect here.
> Anyhow, since it is somewhat heuristic (and may not be maintainable for
> a long term), could you review it from a broader viewpoint of toolchain,
> please?
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html
> [2] aarch64_analyze_prologue() in gdb/aarch64-tdep.c

My main issue with this is that we cannot rely on the frame layout
generated by the compiler and there's little point in asking for
commitment here. Therefore, the heuristics will need updating as and
when we identify new frames that we can't handle. That's pretty fragile
and puts us on the back foot when faced with newer compilers. This might
be sustainable if we don't expect to encounter much variation, but even
that would require some sort of "buy-in" from the various toolchain
communities.

GCC already has an option (-fstack-usage) to determine the stack usage
on a per-function basis and produce a report at build time. Why can't
we use that to provide the information we need, rather than attempt to
compute it at runtime based on your analyser?

If -fstack-usage is not sufficient, understanding why might allow us to
propose a better option.

Will


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]