Structure of GCC
Like most portable compilers, the compilation process of a GCC-based compiler can be conceptually split up in three phases:
There is a separate [:FrontEnd:front end] for each supported language. A front end
- takes the source code, and does whatever is needed to translate that source code into a semantically equivalent, language independent abstract syntax tree (AST). The syntax and semantics of this AST are defined by the ["GIMPLE"] language, the highest level language independent intermediate representation GCC has.
This AST is then run through a list of [:MiddleEnd:target independent code transformations]
- that take care of such things as constructing a control flow graph, and [optimizing
the [:TreeOptimizers:AST] for optimizing compilations, lowering to non-strict ["RTL"] (["expand"]), and running [:RTLOptimizers:RTL based optimizations] for optimizing compilations. The non-strict RTL is handed over to more low-level passes.
- that take care of such things as constructing a control flow graph, and [optimizing
The low-level passes are the passes that are part of the [:BackEnd:code generation] process.
- The first job of these passes is to turn the non-strict RTL representation into strict RTL,
or in other words, from RTL patterns that match define_insn definitions without taking constraints into consideration into RTL patterns that fully match the complete =insn= definition *including* all operand constraints (Right now the one pass that takes care of this now is ["reload"], but this is suboptimal). Other jobs of the strict RTL passes include scheduling, doing peephole optimizations, and emitting the assembly output.
- The first job of these passes is to turn the non-strict RTL representation into strict RTL,
Neither the AST nor the non-strict RTL representations are completely target independent, but the ["GIMPLE"] language is, and in non-strict ["RTL"] form the representation is still not really machine assembly, so the passes that work on non-strict RTL can still be considered target independent to some extent (even passes like ["combine"] do not have to worry too much about the target machine). The passes working on strict RTL are really assembler optimizers, which clearly need to take into account far more information about the target architecture.
The source file hierarchy is described in its [http://gcc.gnu.org/onlinedocs/gccint/Source-Tree.html#Source-Tree documentation]. See also the page on ["Regenerating_GCC_Configuration"]