This is the mail archive of the
mailing list for the GCC project.
Re: [RFC][AArch64] function prologue analyzer in linux kernel
- From: Will Deacon <will dot deacon at arm dot com>
- To: AKASHI Takahiro <takahiro dot akashi at linaro dot org>
- Cc: "Richard Earnshaw (lists)" <Richard dot Earnshaw at arm dot com>, GCC Development <gcc at gcc dot gnu dot org>
- Date: Tue, 12 Jan 2016 18:04:34 +0000
- Subject: Re: [RFC][AArch64] function prologue analyzer in linux kernel
- Authentication-results: sourceware.org; auth=none
- References: <567BA582 dot 4060707 at linaro dot org> <20160107142247 dot GF23028 at arm dot com> <568E7C8C dot 8000009 at arm dot com> <568F4AE0 dot 7070206 at linaro dot org> <20160108155344 dot GD11228 at arm dot com> <56949911 dot 7000902 at linaro dot org>
On Tue, Jan 12, 2016 at 03:11:29PM +0900, AKASHI Takahiro wrote:
> On 01/09/2016 12:53 AM, Will Deacon wrote:
> >On Fri, Jan 08, 2016 at 02:36:32PM +0900, AKASHI Takahiro wrote:
> >>On 01/07/2016 11:56 PM, Richard Earnshaw (lists) wrote:
> >>>On 07/01/16 14:22, Will Deacon wrote:
> >>>>On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote:
> >>>>>So I'd like to introduce a function prologue analyzer to determine
> >>>>>a size allocated by a function's prologue and deduce it from "Depth".
> >>>>>My implementation of this analyzer has been submitted to
> >>>>>linux-arm-kernel mailing list.
> >>>>>I borrowed some ideas from gdb's analyzer, especially a loop of
> >>>>>instruction decoding as well as stop of decoding at exiting a basic block,
> >>>>>but implemented my own simplified one because gdb version seems to do
> >>>>>a bit more than what we expect here.
> >>>>>Anyhow, since it is somewhat heuristic (and may not be maintainable for
> >>>>>a long term), could you review it from a broader viewpoint of toolchain,
> >>>>My main issue with this is that we cannot rely on the frame layout
> >>>>generated by the compiler and there's little point in asking for
> >>>>commitment here. Therefore, the heuristics will need updating as and
> >>>>when we identify new frames that we can't handle. That's pretty fragile
> >>>>and puts us on the back foot when faced with newer compilers. This might
> >>>>be sustainable if we don't expect to encounter much variation, but even
> >>>>that would require some sort of "buy-in" from the various toolchain
> >>>>GCC already has an option (-fstack-usage) to determine the stack usage
> >>>>on a per-function basis and produce a report at build time. Why can't
> >>>>we use that to provide the information we need, rather than attempt to
> >>>>compute it at runtime based on your analyser?
> >>>>If -fstack-usage is not sufficient, understanding why might allow us to
> >>>>propose a better option.
> >>>Can you not use the dwarf frame unwind data? That's always sufficient
> >>>to recover the CFA (canonical frame address - the value in SP when
> >>>executing the first instruction in a function). It seems to me it's
> >>>unlikely you're going to need something that's an exceedingly high
> >>>performance operation.
> >>Thank you for your comment.
> >>Yeah, but we need some utility routines to handle unwind data(.debug_frame).
> >>In fact, some guy has already attempted to merge (part of) libunwind into
> >>the kernel, but it was rejected by the kernel community (including Linus
> >>if I correctly remember). It seems that they thought the code was still buggy.
> >The ARC guys seem to have sneaked something in for their architecture:
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arc/kernel/unwind.c
> >so it might not be impossible if we don't require all the bells and
> >whistles of libunwind.
> Thanks. I didn't notice this code.
> >>That is one of reasons that I wanted to implement my own analyzer.
> >I still don't understand why you can't use fstack-usage. Can you please
> >tell me why that doesn't work? Am I missing something?
> I don't know how gcc calculates the usage here, but I guess it would be more
> robust than my analyzer.
> The issues, that come up to my mind, are
> - -fstack-usage generates a separate output file, *.su and so we have to
> manage them to be incorporated in the kernel binary.
That doesn't sound too bad to me. How much data are we talking about here?
> This implies that (common) kernel makefiles might have to be a bit changed.
> - more worse, what if kernel module case? We will have no way to let the kernel
> know the stack usage without adding an extra step at loading.
We can easily add a new __init section to modules, which is a table
representing the module functions and their stack sizes (like we do
for other things like alternatives). We'd just then need to slurp this
information at load time and throw it into an rbtree or something.