This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][AArch64] function prologue analyzer in linux kernel


On 07/01/16 14:22, Will Deacon wrote:
> Akashi,
> 
> On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote:
>> I'm the author of ftrace support on arm64(aarch64) linux. As part of
>> ftrace, we can utilize "stack tracer" which reports the maximum usage
>> of kernel stack:
> 
> We're probably missing some important background here -- I suspect most
> of the GCC folk are wondering wtf this is and what it's doing on their
> list ;)
> 
> It's partly my fault, since I asked you to run this by the compiler guys,
> but that's because I have concerns on the approach. See below.
> 
>> ---8<---
>> # cat /sys/kernel/debug/tracing/stack_max_size
>> 4088
>> # cat /sys/kernel/debug/tracing/stack_trace
>>         Depth    Size   Location    (49 entries)
>>         -----    ----   --------
>>   0)     4088      16   __local_bh_enable_ip+0x18/0xd8
>>   1)     4072      32   _raw_read_unlock_bh+0x38/0x48
>>   2)     4040      32   xs_udp_write_space+0x44/0x50
>>   3)     4008      32   sock_wfree+0x88/0x90
>>   4)     3976      32   skb_release_head_state+0x70/0xa0
>>  [snip]
>>  44)      808      32   load_elf_binary+0x29c/0x10d0
>>  45)      776     224   search_binary_handler+0xbc/0x208
>>  46)      552      96   do_execveat_common.isra.15+0x4e4/0x690
>>  47)      456     112   SyS_execve+0x4c/0x60
>>  48)      344     344   el0_svc_naked+0x24/0x28
>> --->8---
>>
>> Here, "Depth" (and hence "Size") is determined, after scanning a stack,
>> by saved fp pointer (more precisely + 0x10) in a stack frame instead
>> of (not saved) stack pointer. (Please note that arm64 kernel is always
>> compiled with -fno-omit-frame-pointer.)
>>
>> As fp is updated after branching into a function, and allocates not only
>> a function's stack frame but also callee's local variables, using this
>> saved value of fp as "Depth", or sp of a caller function, is not
>> appropriate for calculating a stack size of a function.
>>
>> So I'd like to introduce a function prologue analyzer to determine
>> a size allocated by a function's prologue and deduce it from "Depth".
>> My implementation of this analyzer has been submitted to
>> linux-arm-kernel mailing list[1].
>> I borrowed some ideas from gdb's analyzer[2], especially a loop of
>> instruction decoding as well as stop of decoding at exiting a basic block,
>> but implemented my own simplified one because gdb version seems to do
>> a bit more than what we expect here.
>> Anyhow, since it is somewhat heuristic (and may not be maintainable for
>> a long term), could you review it from a broader viewpoint of toolchain,
>> please?
>>
>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/393721.html
>> [2] aarch64_analyze_prologue() in gdb/aarch64-tdep.c
> 
> My main issue with this is that we cannot rely on the frame layout
> generated by the compiler and there's little point in asking for
> commitment here. Therefore, the heuristics will need updating as and
> when we identify new frames that we can't handle. That's pretty fragile
> and puts us on the back foot when faced with newer compilers. This might
> be sustainable if we don't expect to encounter much variation, but even
> that would require some sort of "buy-in" from the various toolchain
> communities.
> 
> GCC already has an option (-fstack-usage) to determine the stack usage
> on a per-function basis and produce a report at build time. Why can't
> we use that to provide the information we need, rather than attempt to
> compute it at runtime based on your analyser?
> 
> If -fstack-usage is not sufficient, understanding why might allow us to
> propose a better option.
> 
> Will
> 

Can you not use the dwarf frame unwind data?  That's always sufficient
to recover the CFA (canonical frame address - the value in SP when
executing the first instruction in a function).  It seems to me it's
unlikely you're going to need something that's an exceedingly high
performance operation.

R.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]