This is the mail archive of the
mailing list for the GCC project.
[tree-ssa] AST optimizer in C++?
On Fri, Aug 23, 2002 at 12:28:53PM -0500, Chris Lattner wrote:
> Anyway, I'd love to discuss various IR design issues. If anyone would
> like to talk about the Pros/cons of optimizing on a tree vs a low-level
> IR with type information, I can certainly talk about the experience I have
> with it. Hopefully I can convince you that tree's are not really
> neccesary at all. :) :)
Agree, since we deal with (more or less) the same amount of information
structured differently: trees vs. list of instructions.
AST representation as it exists now has the following information transformed
directly from the input source code:
- control flow
Intermediate representations are created for better handling all this information.
IR can be viewed as sub-languages of the original language:
- CFG is a the sub-language that contains information about the control flow
- SSA and dependence graphs for def-use
- Call-graph for caller-callee
- symbol tables for types
These IR expose the same information as the original source code but in a more
compact form to optimizers.
OTOH this information is not ready to be used dirrectly by optimizers since IRs
can produce different representations for the same semantic-equivalent program.
This increase the complexity of optimizers. The idea is to reduce a class of
semantic-equivalent programs to a single representative program. This is the
- SIMPLE normalizes def-use IRs
- goto/break/switch elimination and loop canonicalization aim to normalize
- ??? normalization for caller-callee
- ??? normalization for types
(I have no idea yet about question marks... work in progress...)
Low Level Virtual Machine (LLVM) defines a virtual instruction set that
contains exactly the same amount of information as the AST representation and
it has all the benefits of the three-address-code normalization.
However LLVM does not provide support for high level loop constructs
(nested loops for example). It defines instead branch "br" instructions.
You then need extra informations for loop nest optimizers.
Thus, I agree that writing optimizers on LLVM is as difficult :-) as writing
them on trees under SIMPLE form. However I think that introducing yet another
translation from trees to LLVM could be expensive in compiling time (yet it is
a quick transformation from SIMPLE).
A good idea I saw in the LLVM is that the SSA infromation is stored directly
in the operations stream: ie. (from LLVMCompilationStrategy.pdf)
<result> = phi <type> [<val0>, <label0>], ..., [<valN>, <labelN>]
Diego, maybe it is worth to include PHI nodes directly as a SIMPLE extension?
This could solve the problem with the tree_common.aux field... The only difficult
bit I see is to remove them before we generate RTL (or otherwise propagate
information about SSA into RTL through the translation of these nodes).
What do others think?