This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][AArch64 - v3] Simplify eh_return implementation
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: Ramana Radhakrishnan <ramana dot radhakrishnan at foss dot arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Mon, 16 Jan 2017 11:05:06 +0000
- Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=pass (sender IP is 217.140.96.140) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com;
- Nodisclaimer: True
- References: <AM5PR0802MB2610A11A1EA0E7688CF75DAA83070@AM5PR0802MB2610.eurprd08.prod.outlook.com> <AM5PR0802MB2610E248D5510D8CFC1AD7B6831D0@AM5PR0802MB2610.eurprd08.prod.outlook.com> <d7bc617f-2f79-e8be-5f56-f5031dfe7d24@foss.arm.com> <AM5PR0802MB2610951AB0C1BD6FD5F748B183E50@AM5PR0802MB2610.eurprd08.prod.outlook.com> <20170113175441.GA38433@arm.com> <AM5PR0802MB2610042D0F1AE451109A8AC483780@AM5PR0802MB2610.eurprd08.prod.outlook.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On Fri, Jan 13, 2017 at 07:50:48PM +0000, Wilco Dijkstra wrote:
> James Greenhalgh wrote:
>
> > I've been putting off reviewing this patch for a while now, because I don't
> > understand enough about the current eh_return code to understand why what
> > you're proposing is correct.
> >
> > The best way to progress this patch would be to go in to more detail as to
> > what the current code does, why we don't need it, and why your new
> > implementation is sufficient. The current code looks like it covers special
> > cases, and calls out DSE and CSELIB as needing special handling - your
> > new code doesn't.
> >
> > Could you explain your patch to me assuming I know very little about the
> > implementation, with particular focus on why we no longer need the special
> > cases?
>
> Well eh_return is far more complex than the few lines of assembly code it was
> intending to avoid...
>
> Basically the EH return feature wants the capability to either return
> normally or return to a previous frame after unwinding. Oddly enough the
> design is to use a single shared return sequence. The epilog is exactly like
> a normal epilog except that it has an extra input register which contains the
> stack adjustment which must be applied after the frame has been destroyed. An
> extra label is inserted before the eplilog which sets this stack adjustment
> to zero, and this is the entry point for a normal return.
>
> An EH return updates the return address, initializes the stack adjustment and
> jumps directly into the epilog (bypassing the zeroing of the adjustment).
> Sounds insane already? Well it gets worse...
>
> Given the return address is typically saved on the stack when a function
> makes a call, we now need to overwrite the saved LR outside the epilog. This
> poses a problem as we need to emit this store before the epilog is emitted,
> ie. before the offset of LR is even known. We also need to ensure the store
> is not removed.
>
> The existing implementation tries to do this by using a special eh_return
> pattern which emits the store as late as possible (at least after the frame
> layout has been finalized). However it doesn't do this late enough so DSE is
> still run after it...
>
> To try to avoid DSE removing the store, the existing implementation tries to
> make the store alias with the load by using the same base register and
> offset. Given the epilog always reads LR from SP, it follows that whenever FP
> is used for the store, it will be optimized away.
>
> Even when it uses SP as the base for the store, SP may be updated in the
> epilog before LR is read (for example alloca functions or frames with large
> locals or outgoing arguments will update SP before reading LR). Worse, the
> offset it uses for the store is incorrect, so if the store is not optimized,
> it just corrupts a random callee-save rather than overwriting LR.
>
> So basically to conclude the existing implementation only works for one
> specific case where there are a specific number of callee-saves, no floating
> point registers, no outgoing arguments, no alloca used, a small number of
> locals, and frame pointer is enabled.
>
> The new implementation is trivially correct by forcing the use of a frame
> pointer so that the location of LR is fixed, known early, and using a
> volatile means no optimization could remove the store. It's also much simpler
> by avoiding all the workarounds using patterns, splitters and trying to
> ensure the store appears not dead.
Great, thanks for the detailed explanation. If we can borrow most of this
text and place it in the code as a comment, I think this patch will be good
to go.
How about putting this slight rewording at the top of
aarch64_eh_return_handler_rtx:
/* The EH return feature wants the capability to either return
normally or return to a previous frame after unwinding.
The design is to use a single shared return sequence. The epilogue is
exactly like a normal epilogue except that it has an extra input
register which contains the stack adjustment which must be applied after
the frame has been destroyed. An extra label is inserted before the
epilogue which sets this stack adjustment to zero, and this is the
entry point for a normal return.
An EH return updates the return address, initializes the stack
adjustment and jumps directly into the epilogue (bypassing the zeroing
of the adjustment). But, it gets worse...
Given the return address is typically saved on the stack when a function
makes a call, we now need to overwrite the saved LR outside the epilogue.
This poses a problem as we need to emit this store before the
epilogue is emitted, i.e. before the offset of LR is even known. We also
need to ensure the store is not removed.
Previously, we did the by using a special eh_return pattern which
would emits the store as late as possible (at least after the frame
layout has been finalized). However, DSE would still be run after this
expansion so it would not be safe.
To try to avoid DSE removing the store, we previously tried to
make the store alias with the load by using the same base register
and offset. Given the epilogue always reads LR from SP, it follows
that whenever FP is used for the store, it will be optimized away.
Even when we used SP as the base for the store, SP may be updated in the
epilogue before LR is read (for example alloca functions or frames with
large locals or outgoing arguments will update SP before reading LR).
This simpler implementation is trivially correct as it forces the use
of a frame pointer for eh_return functions so that the location of LR
is fixed and known early, marking the store volatile means no optimization
is permitted to remove the store. It avoids workarounds using
patterns, splitters and other tricks which try to ensure the store
does not appear dead. */
I'd then be tempted to repeat this text directly above
MEM_VOLATILE_P (tmp) = true;
/* Marking the store volatile means no optimization is permitted to
remove the store. It avoids workarounds using patterns, splitters
and other tricks which try to ensure the store does not appear
dead. */
If you have other patches which have been stuck in review for a long time,
detailed explanations like this can really help speed up the process of
reviewing and approving them.
Thanks,
James