Structure of GCC
Like most portable compilers, the compilation process of a GCC-based compiler can be conceptually split up in three phases:
There is a separate front end for each supported language. A front end
- takes the source code, and does whatever is needed to translate that source code into a semantically equivalent, language independent abstract syntax tree (AST).
The syntax and semantics of this AST are defined by the GIMPLE language, the highest level language independent intermediate representation GCC has.
- takes the source code, and does whatever is needed to translate that source code into a semantically equivalent, language independent abstract syntax tree (AST).
This AST is then run through a list of target independent code transformations
- that take care of such things as constructing a control flow graph, and [optimizing
the AST for optimizing compilations, lowering to non-strict RTL (expand), and running RTL based optimizations for optimizing compilations. The non-strict RTL is handed over to more low-level passes.
- that take care of such things as constructing a control flow graph, and [optimizing
The low-level passes are the passes that are part of the code generation process.
- The first job of these passes is to turn the non-strict RTL representation into strict RTL,
or in other words, from RTL patterns that match define_insn definitions without taking constraints into consideration into RTL patterns that fully match the complete =insn= definition *including* all operand constraints (Right now the one pass that takes care of this now is reload, but this is suboptimal). Other jobs of the strict RTL passes include scheduling, doing peephole optimizations, and emitting the assembly output.
- The first job of these passes is to turn the non-strict RTL representation into strict RTL,
Neither the AST nor the non-strict RTL representations are completely target independent, but the GIMPLE language is, and in non-strict RTL form the representation is still not really machine assembly, so the passes that work on non-strict RTL can still be considered target independent to some extent (even passes like combine do not have to worry too much about the target machine). The passes working on strict RTL are really assembler optimizers, which clearly need to take into account far more information about the target architecture.
The source file hierarchy is described in its documentation. See also the page on regenerating configure scripts