Lord knows how GCC does register allocation right now. There are two files that assist in register allocation: [gccsource:local-alloc.c] that does local register allocation, and [gccsource:global.c] that performs some sort of priority coloring register allocation. But in the end, reload may decide differently, and it sometimes makes stupid choices.

In the last 5 years, at least two efforts at writing a graph coloring register allocator for GCC have failed. The main reasons for the failure of the first project by Dave Miller are unknown, no patches from that project ever made it to any mailing list. The latest effort wasnew-ra the new register allocator that is on the new-regalloc-branch in CVS. (This project produced code which was incorporated into several releases of GCC made between mid-2002 and late 2004, but it never worked reliably, and was removed from mainline in January 2005.) It appears that this project has been abandoned as well. The reasons for the failure of this project, as well as the things it did successfully, were discussed in a long thread that starts here.

If anyone wants to take on a new new-ra effort (and by all means, do not include thenewword in the name of such a project ;-), below is a short list of interesting papers to read. These papers all are classic works in the area of graph coloring register allocation.

  1. "How to Build an Interference Graph" Cooper, Harvey, and Torczon

  2. "Improvements to Graph Coloring Register Allocation" Briggs, Cooper, and Torczon

  3. "A generalized algorithm for graph-coloring register allocation" Smith, Ramsey, and Holloway

  4. "Live Range Splitting in a Graph Coloring Register Allocator" Cooper and Simpson

  5. "Spill Code Minimization via Interference Region Spilling" Bergner, Dahl, Engebretsen, and O'Keefe

  6. "Optimistic Register Coalescing" Park and Moon

You should also read the following papers that you can find in the GCC summit proceedings of 2003 and 2004:

  1. "Design and Implementation of a Graph Coloring Register Allocator for GCC" Michael Matz, 2003

  2. "Register Rematerialization In GCC" Mukta Punjani, 2003

  3. "Optimal Stack Slot Assignment in GCC" Naveem Sharma and Sanjiv Kumar Gupta, 2003

  4. "Fighting register pressure in GCC" Vladimir Makarov, 2004

  5. "Addressing Mode Selection in GCC" Naveen Sharma, 2004

The first two papers discuss the now failednew-raproject. Vladimir Makarov's paper gives a good overview of the existing register allocator. The other two discuss two problems with register allocation for small displacement embedded targets that are typical for GCC.

There are some really good ideas in the new-regalloc-branch such as pre-reload.c and a new RTL mode in [gccsource:recog.c] that is almost strict RTL, but still with pseudo-registers instead of only hard registers.

Unfortunately much of the implementation of pre-reload.c and really much of basically everything on the branch, is crufty if not just junk. To summarize:

  1. The job of the register allocator is to ensure that the machine resources are used optimally. Selecting instruction alternatives (and satisfying constraints) should happen before regalloc as much as possible, and in general machine descriptions should not rely on reload to fix all the silly insns that look nothing like the actual machine instructions. Otherwise regalloc simply does not get all the information it needs, and reload ends up replacing most of your spill code with more spill code. This is pre-reload on new-regalloc-branch, which is an excellent idea except that pre-reload.c is just a stripped version of reload{,1}.c -- something new-ra was supposed to get rid of. It could be written in a saner way as it works just on pseudos and hence has a little more freedom. Of course this would have taken longer ;)

  2. Web class, replaces regclass and choose register classes for webs instead of pseudos. This also includes splitting webs if a register in a web really wants to be in two different classes to satisfy constraints in two different insns. Right now, as far as I understand, regclass just picks one and lets reload figure out how to fix up that mistake.
  3. Reload could ultimately disappear. Surely, since there used to be only 37 reloads in a GCC bootstrap on new-regalloc-branch, it can be stripped down much (i.e. no reload inheritance, which is a very messy part of reload). Reload inheritance is a nice concept, but the implementation is just about incomprehensible and unmaintainable. Given the size of it, it also doesn't really work all that well - witness the post-reload attempts like reload_cse_regs which fix up what reload left behind.

Jeff Law suggested an alternative approach, though not one for the faint of heart: "Instead of rewriting the register allocator, let's start with rewriting reload, then global & local rather than the other way around. In fact, I'm [sic] probably start from the end of reload and work backwards within reload itself. I wouldn't be terribly surprised if we could formulate most of reload's actions as a subclass of register allocation via graph coloring." (Too bad the typo "I'm probably start" is more likely to be "I'd probably start" than "I'm probably starting"...)

Reload guru Bernd Schmidt has a patch that rewrites the uglier parts of reload, eliminating over 2000 lines of code in the meanwhile; latest update is at http://gcc.gnu.org/ml/gcc/2005-01/msg01122.html. Citing Bernd,

"What I did was to detect dependencies between reload insns and create an ordering with a mini-scheduler. All information about reloads and reload insns is kept around while processing the insns, and inheritance is performed as an additional pass over this information. Inheritance becomes a localized pass instead of bits of code strewn across multiple files, and since it has more global information to work with, it can make better choices (in theory, anyway)."

Another big problem with writing a new register allocator is dealing with subregs. Jim Wilson prepared a nice list of what GCC uses subregs for:

  1. Extracting part of a register. The register allocator relies on the fact that pseudo-regs have unique shared RTL, and thus register allocation can be performed with a single store to REGNO (reg). See alter_reg. If pseudo-regs were not unique, then we would have to scan all RTL to find all references, and fix them all, which would be impractical. Thus any reference to a pseudo-reg in a different mode than its native mode requires a subreg. This is the original use for subregs.
  2. (a) Extracting a low part subset of a word sized or smaller register. This occurs for example when casting an int register to a char. This generally has no effect on register allocation, as long as the constraints matched by the pattern are satisfied.
  3. (b) Extracting a non-low part subset of a word sized of smaller register. This is poorly supported and probably doesn't work in most places. It wasn't even possible to express this until subregs were changed to use byte offsets instead of word offsets. This probably doesn't occur much if at all in practice.
  4. (c) Extracting a low part subset of a larger than word sized register. This is like 1a above, except that the meaning also depends on the word endianness.
  5. (d) Extracting a non-low part subset of a larger than word sized register. This is well defined only cases where the size and alignment are word sized or a multiple of word sized. This is used when we want to perform operations piecemeal. For instance, moving a double word register one register at a time. The meaning depends on word endianness, but otherwise isn't much different from 1a above.
  6. Mode punning. This can result from type punning, e.g. using SImode to access an SFmode register. This can also be used in conjunction with 1d above, e.g. splitting a DFmode register into two SImode parts. This is also sometimes an artifact of how an md file is written. Some targets have restrictions on mode punning, so this is valid only when CANNOT_CHANGE_MODE_CLASS allows it. Otherwise, this generally doesn't affect register allocation as long as constraints match.
  7. Making md file RTL match hardware. We don't have RTL operations for all possible hardware operations. Also, most RTL operators have assumptions about the modes of their operands that must be maintained. subregs are convenient for solving these kinds of problems. A subreg can be used to change the size and/or mode class of an operand or an operator as necessary. A subreg can be used to make a value larger when neither zero_extend nor sign_extend is appropriate. This is usually the case when the extra bits are don't care bits. These have no effect on register allocation, as they are not part of the operands.
  8. Paradoxical subregs. A paradoxical subreg is used when we operate on a register using a larger mode than the native mode of the register. When we have a paradoxical subreg, all of extra bits above the native mode of the register are don't care bits. For instance, on an alpha, which lacks many 32-bit operations, we perform 32-bit addition by using DImode paradoxical subregs of SImode pseudo-regs. The upper 32-bits of the register are don't care bits.
  9. (a) Word sized or smaller paradoxical subreg of a word sized of smaller pseudo reg. Generally no effect on register allocation as long as the constraints match. This is the alpha addition case mentioned above.
  10. (b) Larger than word sized paradoxical subreg of a pseudo reg. This may affect register allocation, as we may need to allocate more space for a pseudo register than its native size. Since the extra bits are don't care bits, we can sometimes optimize away the need to allocate more space, so we don't always need the extra space. If we force the pseudo into a stack slot, then we do need the extra space in the stack slot.

None: RegisterAllocation (last edited 2008-01-10 19:38:39 by localhost)