Notes on handling of the basic block graph, or control flow graph
How it works now
The control flow graph is represented using the basic_block structures defined in [gccsource:basic-block.h]. Blocks are linked together as a double linked list, and pointers to in-edges and out-edges of each blocks are stored in a pair of vectors pointed to from each block.
At the tree level, the control flow graph is a fundamental part of the IL. In order to walk the IL, you walk the basic blocks, and process each statement that occurs in them. Changes to control flow are always reflected in the graph. Relevant source files are [gccsource:cfg.c], [gccsource:cfgbuild.c], [gccsource:tree-cfg.c], etc.
At the RTL level, the graph of basic_block structures still exists, but is treated as an annotation. There is one continuous insn chain for the entire function being compiled, and all control transfer operations are represented with explicit RTL. Each insn has a pointer to its containing basic block, but there are also a fair number of note insns whose sole function is to mark basic block boundaries (see [gccsource:insn-notes.def]). Very late in compilation, [gccsource:bb-reorder.c] decides what order the basic blocks will actually be emitted in, and therefore which edges are "fall through"; however, the insn chain means that a total ordering is maintained all the way through.
Some RTL optimizers mangle the basic block graph, or information carried along with the graph: notably, the delay-slot scheduler ([gccsource:reorg.c].
How it ought to work
First off, probably everyone agrees that the basic block graph ought to be preserved in accurate form all the way through the compilation. Optimizers should never destroy the graph.
Second, we shouldn't construct a complete insn chain for the function until just before final assembly output. Each basic block would have its own insn chain. This is already implemented in [gccsource:cfglayout.c], which transforms the control flow graph into the so-called cfglayout_mode. In this mode, the insn chain is broken up at basic block boundaries, which makes block reordering and duplication much easier. Currently this cfglayout mode is used by RTL passes that need to reorder basic blocks, like the new loop optimizer, superblock formation and basic block reordering. Eventually (well, hopefully soon) *all* RTL passes should be converted to be aware of basic block boundaries in the insns stream, and never cross them except by following edges in the control flow graph. When that is done, all passes can work in cfglayout mode, and we won't have to maintaion the complete insn chain anymore.
(Note that this would probably make if-conversion a lot more effective, for example. With the current complete insn chain idea, GCC can only if-convert a conditional jump if one of the arms of the conditional falls through to the next insn in the chain. In cfglayout_mode, this restriction wouldn't be necessary. Currently if-conversion ([gccsource:ifcvt.c]) can't work in cfglayout mode because it tries to merge blocks that are not really mergeable, but that's a separate issue.)
Third, some people think there should be no concept of fall-through edges at all before the basic block reordering pass (but this is more controversial). This entails not committing to the 'sense' of conditional branch instructions until that point, which is probably the biggest change. We wouldn't have explicit CODE_LABEL insns either; branches would look something like this
(jump_insn ### (set (pc)
(if_then_else (...)
(block_ref <bb 2>)
(block_ref <bb 4>)))
...)The existing basic-block reordering pass would be adapted to assemble the blocks into the desired order and rewrite the jump instructions accordingly. Decisions about partitioning code into "hot" and "cold" sections would also be taken at this point. CODE_LABEL rtl may reappear at this point, or perhaps we could just emit a label at the beginning of every basic block -- any places where the label isn't needed, the blocks should have been fused anyway.
Delay-slot scheduling has to run after this, because the number of available delay slots may depend on exactly which branch instruction gets picked, but should not need to use the bizarre SEQUENCE representation that the current one does (because the basic block graph would be preserved, so [gccsource:final.c] won't be confused by jump insns that don't come last in their basic block).