This is the mail archive of the
mailing list for the GCC project.
Re: Register Allocation
- From: Ian Lance Taylor <ian at airs dot com>
- To: Andrew MacLeod <amacleod at redhat dot com>
- Cc: gcc mailing list <gcc at gcc dot gnu dot org>
- Date: 18 Nov 2005 22:18:52 -0800
- Subject: Re: Register Allocation
- References: <email@example.com>
Andrew MacLeod <firstname.lastname@example.org> writes:
> The document is intended as a starting point and consists mostly of my
> thoughts at the moment. By the time the underlying RTL bits are done, I
> would like it to have evolved to include input from others. The more
> useful comments there are, the better the chance of us getting a decent
Thanks for thinking about this and writing this up.
When I look at reload, there are a few things that make the code
overly complex: reload inheritance, general hueristics to handle
memory addresses which are intended to work on all machines, and
secondary/tertiary reloads and the reload_in/out patterns. So I'd
like to quickly look at these in your framework.
Secondary/tertiary reloads and reload_in/out patterns are apparently
subsumed by the Spill Engine. Porting a target to the new framework
is going to require providing the Spill Engine with the appropriate
instructions for storing and loading each native register class,
including a description of any temporary or scratch registers
required. That seems reasonable. One minor thing that I don't quite
understand is that bit about reaching a limit of SPILLIDs. For
targets with limited displacement, what matters is when you start
needing larger displacements, which as far as I can see has more to do
with the overall stack frame size than the number of SPILLIDs.
Reload inheritance is a complex idea threaded through reload. The
goal is to reuse values which happen to already be in registers
whenever possible. In particular, to do this for reloads required for
a single insn. A typical simple example is 'a++' where 'a' winds up
living on the stack at a large unrepresentable offset. You don't want
to wind up computing the address of 'a' twice, although that is what a
naive spill code implementation. That example is too simple, I
suppose, but it has the right general flavor.
Reload inheritance is particularly important on machines with limited
numbers of registers and limited addressing capability--e.g., SH,
Thumb, MIPS16. The current reload implementation needs to do reload
inheritance as it goes, or it will run out of spill registers in some
cases. That is why a reload CSE pass after reload is not enough to
solve the problem.
I would be interested in your thoughts on handling this general issue
in your framework. One approach that seems reasonably promising to me
is to first do pseudo spill code for the instructions in a basic
block, then run CSE over the pseudo code, and only then identify real
spill registers to fit into the pseudo code.
The current reload pass includes general heuristics to handle
reloading memory addresses. This code knows things like "if stack
pointer plus displacement is not a valid memory address, try loading
the displacement into a register." Many targets currently rely on
those heuristics to generate valid code. I haven't been able to quite
pin down where this happens in your proposal. For example, it's easy
for an address to use the frame pointer and be valid before reload,
and then for reload to eliminate the frame pointer (in fact, in your
scheme, what does frame pointer elimination?) and produce an offset
from the stack pointer which is invalid. That is, spill code or frame
pointer elimination can generate invalid address, and something needs
to fix them up. Where does that happen, and how?
The other main thing that bugs me about the current register
allocation scheme is that there is no communication with the
scheduler. The scheduler can easily increase register pressure so
much that the resulting code is worse than if the scheduler had never
run at all. Naturally this happens particularly on architectures with
relatively few registers and relatively deep pipelines, like XScale.
I don't want to argue that your scheme needs to solve every problem.
But I think that any attempt to tackle the register allocator should
think seriously about better integration with the scheduler. I don't
see anything about that in your proposal.