Aligning stack offsets for spills
Tue Jun 8 14:08:03 GMT 2021
On Mon, 7 Jun 2021, Jeff Law wrote:
> So, as many of you know I left Red Hat a while ago and joined Tachyum. We're
> building a new processor and we've come across an issue where I think we need
> upstream discussion.
> I can't divulge many of the details right now, but one of the quirks of our
> architecture is that reg+d addressing modes for our vector loads/stores
> require the displacement to be aligned. This is an artifact of how these
> instructions are encoded.
> Obviously we can emit a load of the address into a register when the
> displacement isn't aligned. From a correctness point that works perfectly.
> Unfortunately, it's a significant performance hit on some standard benchmarks
> (spec) where we have a great number of spills of vector objects into the stack
> at unaligned offsets in the hot parts of the code.
> We've considered 3 possible approaches to solve this problem.
> 1. When the displacement isn't properly aligned, allocate more space in
> assign_stack_local so that we can make the offset aligned. The downside is
> this potentially burns a lot of stack space, but in practice the cost was
> minimal (16 bytes in a 9k frame) From a performance standpoint this works
> 2. Abuse the register elimination code to create a second pointer into the
> stack. Spills would start as <virtual> + offset, then either get eliminated
> to sp+offset' when the offset is aligned or gpr+offset'' when the offset
> wasn't properly aligned. We started a bit down this path, but with #1 working
> so well, we didn't get this approach to proof-of-concept.
> 3. Hack up the post-reload optimizers to fix things up as best as we can.
> This may still be advantageous, but again with #1 working so well, we didn't
> explore this in any significant way. We may still look at this at some point
> in other contexts.
> Here's what we're playing with. Obviously we'd need a target hook to
> drive this behavior. I was thinking that we'd pass in any slot offset
> alignment requirements (from the target hook) to assign_stack_local and
> that would bubble down to this point in try_fit_stack_local:
Why is the machinery involving STACK_SLOT_ALIGNMENT and
spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
backing stack slots) not working for you? If everything is setup
correctly the input alignment to try_fit_stack_local ought to be correct
More information about the Gcc-patches