This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: Patch to align spills beyond what the stack supports


On 05/07/2015 11:52 AM, Steve Ellcey  wrote:
I would like to get some feedback on an idea of how to spill registers
that require (or perhaps only prefer for performance reasons) an alignment
greater than that supported by the stack.

If you look at how GCC supports local variables with alignment requirements
greater than the stack supports, there are two methods.  One is to dynamically
realign the stack and the other is to alloca a block of space on the stack
and create a pointer to an aligned address within that space.

The advantage of the first approach is that in addition to aligning local
variables, aligning the stack will align spill slots since they are also
stored on the stack.  The disadvantage with this approach is that it is
complicated, involves a fair amount of target dependent changes, and risks
breaking a targets ABI if not done very carefully.  The x86 target is the
only one that has implemented this approach as far as I can see.

The advantage of the second approach is that it is target independent and
doesn't risk breaking a targets ABI.  The disadvantage is that it doesn't
help with aligning spill slots since it isn't actually changing the stacks
alignment.

After spending some time trying to implement the dynamic stack alignment
method for MIPS, I decided to try the second approach and extend the alloca
method to spills.

My approach was to create a tree pass that tried to guess at how many
aligned spill slots might be needed in a routine and then allocate an
aligned local variable to hold those spills.  This variable would get the
needed alignment via the existing alloca method used for local variables.

This guessing of how many spills we may need is the main drawback to my
approach because we have no way of actually knowing how many (if any) spill
slots we will end up needing.  The advantage of this approach is that
the only real target dependent code I needed to create was to set a hard
register that could point to the spill variable so that the lra-spill code
could use it when needed for spills.   I didn't implement anything for
non-lra register allocation.

I have been thinking that it might be possible to tag the alloca that
allocates the spill area somehow so that lra-spills could increase (or
decrease) it if needed but I haven't investigated that idea in detail yet.

What do people think about this idea?  Attached is a patch I created that
implements this idea for MIPS.  I am still doing some testing, mostly with
MSA vector registers (a MIPS feature not yet checked in).  Reading and writing
these 16 byte registers on a 16 byte boundary is more efficient than an
8 byte boundary but the O32 MIPS ABI only supports 8 byte stack alignment.

Is this an approach that might be approved for checkin with some further
work?
I think the first approach is generally better -- again, you're doing things fairly early in the pipeline to attack a problem that is much later in the pipeline. While you've probably minimized many of the ways we can lose with this approach today, it still strikes me as likely to be fragile over time.

For example, what happens if some later pass splits one (or more) of the original pseudos, then scheduling interleaves their lifetimes so that they can't share a stack slot. Boom, the estimations aren't going to work well and the user is going to have to start using PARAMs to avoid ICEs/asserts. Not good.

The dynamic realignment is somewhat painful, but it attacks the problem in the right place. There may be bits of it that we should be generalizing to make it easier to implement (since I don't think this is a long term problem that will isolated to x86 and MIPS).

jeff



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]