Aligning stack offsets for spills

Jeff Law jlaw@tachyum.com
Tue Jun 8 14:47:26 GMT 2021



On 6/8/2021 8:08 AM, Michael Matz wrote:
> Hello,
>
> On Mon, 7 Jun 2021, Jeff Law wrote:
>
>> So, as many of you know I left Red Hat a while ago and joined Tachyum.  We're
>> building a new processor and we've come across an issue where I think we need
>> upstream discussion.
>>
>> I can't divulge many of the details right now, but one of the quirks of our
>> architecture is that reg+d addressing modes for our vector loads/stores
>> require the displacement to be aligned.  This is an artifact of how these
>> instructions are encoded.
>>
>> Obviously we can emit a load of the address into a register when the
>> displacement isn't aligned.  From a correctness point that works perfectly.
>> Unfortunately, it's a significant performance hit on some standard benchmarks
>> (spec) where we have a great number of spills of vector objects into the stack
>> at unaligned offsets in the hot parts of the code.
>>
>>
>> We've considered 3 possible approaches to solve this problem.
>>
>> 1. When the displacement isn't properly aligned, allocate more space in
>> assign_stack_local so that we can make the offset aligned.  The downside is
>> this potentially burns a lot of stack space, but in practice the cost was
>> minimal (16 bytes in a 9k frame)  From a performance standpoint this works
>> perfectly.
>>
>> 2. Abuse the register elimination code to create a second pointer into the
>> stack.  Spills would start as <virtual> + offset, then either get eliminated
>> to sp+offset' when the offset is aligned or gpr+offset'' when the offset
>> wasn't properly aligned. We started a bit down this path, but with #1 working
>> so well, we didn't get this approach to proof-of-concept.
>>
>> 3. Hack up the post-reload optimizers to fix things up as best as we can.
>> This may still be advantageous, but again with #1 working so well, we didn't
>> explore this in any significant way.  We may still look at this at some point
>> in other contexts.
>>
>> Here's what we're playing with.  Obviously we'd need a target hook to
>> drive this behavior.  I was thinking that we'd pass in any slot offset
>> alignment requirements (from the target hook) to assign_stack_local and
>> that would bubble down to this point in try_fit_stack_local:
> Why is the machinery involving STACK_SLOT_ALIGNMENT and
> spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
> backing stack slots) not working for you?  If everything is setup
> correctly the input alignment to try_fit_stack_local ought to be correct
> already.
We don't need the MEM as a whole aligned, just the offset in the address 
calculation due to how we encode those instructions.  If I've read that 
code correctly, it would arrange for a dynamic realignment of the stack  
so that it could then align the slot. None of that is necessary for us 
and we'd like to avoid forcing the dynamic stack realignment.  Or did I 
misread the code?

jeff


More information about the Gcc-patches mailing list