This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[tree-ssa] AST optimizer in C++?


On Fri, Aug 23, 2002 at 12:28:53PM -0500, Chris Lattner wrote:
> 
> Anyway, I'd love to discuss various IR design issues.  If anyone would
> like to talk about the Pros/cons of optimizing on a tree vs a low-level
> IR with type information, I can certainly talk about the experience I have
> with it.  Hopefully I can convince you that tree's are not really
> neccesary at all.  :) :)
> 

Agree, since we deal with (more or less) the same amount of information 
structured differently: trees vs. list of instructions.

AST representation as it exists now has the following information transformed 
directly from the input source code:
- control flow
- def-use 
- caller-callee
- types

Intermediate representations are created for better handling all this information.
IR can be viewed as sub-languages of the original language:
- CFG is a the sub-language that contains information about the control flow 
- SSA and dependence graphs for def-use
- Call-graph for caller-callee
- symbol tables for types
(see also 
http://www.stratego-language.org/twiki/bin/view/Stratego/StrategoPublications
http://www.stratego-language.org/twiki/bin/view/Transform/ProgramTransformation
)

These IR expose the same information as the original source code but in a more
compact form to optimizers.  

OTOH this information is not ready to be used dirrectly by optimizers since IRs
can produce different representations for the same semantic-equivalent program.
This increase the complexity of optimizers.  The idea is to reduce a class of
semantic-equivalent programs to a single representative program.  This is the 
normalization process:
- SIMPLE normalizes def-use IRs
- goto/break/switch elimination and loop canonicalization aim to normalize 
  control flow.
- ??? normalization for caller-callee
- ??? normalization for types
(I have no idea yet about question marks... work in progress...)

  Low Level Virtual Machine (LLVM) defines a virtual instruction set that 
contains exactly the same amount of information as the AST representation and 
it has all the benefits of the three-address-code normalization.
  However LLVM does not provide support for high level loop constructs 
(nested loops for example).  It defines instead branch "br" instructions.  
You then need extra informations for loop nest optimizers.
  Thus, I agree that writing optimizers on LLVM is as difficult :-) as writing 
them on trees under SIMPLE form.  However I think that introducing yet another 
translation from trees to LLVM could be expensive in compiling time (yet it is 
a quick transformation from SIMPLE).  

  A good idea I saw in the LLVM is that the SSA infromation is stored directly 
in the operations stream: ie. (from LLVMCompilationStrategy.pdf)
<result> = phi <type> [<val0>, <label0>], ..., [<valN>, <labelN>]

  Diego, maybe it is worth to include PHI nodes directly as a SIMPLE extension?
This could solve the problem with the tree_common.aux field...  The only difficult
bit I see is to remove them before we generate RTL (or otherwise propagate 
information about SSA into RTL through the translation of these nodes).

What do others think?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]