[PATCH][AArch64] Use LDP/STP in shrinkwrapping
Wilco Dijkstra
Wilco.Dijkstra@arm.com
Mon Jan 8 13:31:00 GMT 2018
Segher Boessenkool wrote:
> On Fri, Jan 05, 2018 at 12:22:44PM +0000, Wilco Dijkstra wrote:
>> An example epilog in a shrinkwrapped function before:
>>
>> ldp x21, x22, [sp,#16]
>> ldr x23, [sp,#32]
>> ldr x24, [sp,#40]
>> ldp x25, x26, [sp,#48]
>> ldr x27, [sp,#64]
>> ldr x28, [sp,#72]
>> ldr x30, [sp,#80]
>> ldr d8, [sp,#88]
>> ldp x19, x20, [sp],#96
>> ret
>
> In this example, the compiler already can make a ldp for both x23/x24 and
> x27/x28 just fine (if not in emit_epilogue_components, then simply in a
> peephole); why did that not work? Or is this not the actual generated
> machine code (and there are labels between the insns, for example)?
This block originally had a label in it, 2 blocks emitted identical restores and
then branched to the final epilog. The final epilogue was then duplicated so
we end up with 2 almost identical epilogs of 10 instructions (almost since
there were 1-2 unrelated instructions in both blocks).
Peepholing is very conservative about instructions using SP and won't touch
anything frame related. If this was working better then the backend could just
emit single loads/stores and let peepholing generate LDP/STP.
However this is not the real issue. In the worst case the current code may
only emit LDR and STR. If there are multiple callee-saves in a block, we
want to use LDP/STP, and if there is an odd number of registers, we want
to add a callee-save from an inner block.
Another issue is that after pro_and_epilogue pass I see multiple restores
of the same registers and then a branch to the same block. We should try
to avoid the unnecessary duplication.
Wilco
More information about the Gcc-patches
mailing list