Processor pipeline description - GNU Compiler Collection (GCC) Internals

Previous: Delay Slots, Up: Insn Attributes

10.18.8 Specifying processor pipeline description

To achieve better performance, most modern processors (super-pipelined, superscalar RISC, and VLIW processors) have many functional units on which several instructions can be executed simultaneously. An instruction starts execution if its issue conditions are satisfied. If not, the instruction is stalled until its conditions are satisfied. Such interlock (pipeline) delay causes interruption of the fetching of successor instructions (or demands nop instructions, e.g. for some MIPS processors).

There are two major kinds of interlock delays in modern processors. The first one is a data dependence delay determining instruction latency time. The instruction execution is not started until all source data have been evaluated by prior instructions (there are more complex cases when the instruction execution starts even when the data are not available but will be ready in given time after the instruction execution start). Taking the data dependence delays into account is simple. The data dependence (true, output, and anti-dependence) delay between two instructions is given by a constant. In most cases this approach is adequate. The second kind of interlock delays is a reservation delay. The reservation delay means that two instructions under execution will be in need of shared processors resources, i.e. buses, internal registers, and/or functional units, which are reserved for some time. Taking this kind of delay into account is complex especially for modern RISC processors.

The task of exploiting more processor parallelism is solved by an instruction scheduler. For a better solution to this problem, the instruction scheduler has to have an adequate description of the processor parallelism (or pipeline description). Currently GCC provides two alternative ways to describe processor parallelism, both described below. The first method is outlined in the next section; it was once the only method provided by GCC, and thus is used in a number of exiting ports. The second, and preferred method, specifies functional unit reservations for groups of instructions with the aid of regular expressions. This is called the automaton based description.

The GCC instruction scheduler uses a pipeline hazard recognizer to figure out the possibility of the instruction issue by the processor on a given simulated processor cycle. The pipeline hazard recognizer is automatically generated from the processor pipeline description. The pipeline hazard recognizer generated from the automaton based description is more sophisticated and based on a deterministic finite state automaton (DFA) and therefore faster than one generated from the old description. Furthermore, its speed is not dependent on processor complexity. The instruction issue is possible if there is a transition from one automaton state to another one.

You can use either model to describe processor pipeline characteristics or even mix them. You could use the old description for some processor submodels and the DFA-based one for other processor submodels.

In general, using the automaton based description is preferred. Its model is richer and makes it possible to more accurately describe pipeline characteristics of processors, which results in improved code quality (although sometimes only marginally). It will also be used as an infrastructure to implement sophisticated and practical instruction scheduling which will try many instruction sequences to choose the best one.