This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa] AST optimizer in C++?

> > with it.  Hopefully I can convince you that tree's are not really
> > neccesary at all.  :) :)

> Agree, since we deal with (more or less) the same amount of information
> structured differently: trees vs. list of instructions.

Very true.  Although probably well known, lists of instructions have lots
of advantages as well (over trees):

* Better cache behavior: Tree representations tend to have lots of
  dynamically allocated nodes to represent the equivalent amount of
  information held by a single "instruction".  For example, a single "add"
  instruction can be represented with a single dynamically allocated node
  in list-of-instructions form, but requires at least 3 (and possibly
  more) nodes in a tree representation: One of the instruction, and two
  for the operands.
* Smaller representation: Side effect of the above... with less
  allocations/fewer nodes, the total of the per-node overheads is much
* Simpler to deal with: There are _no_ sharing/unsharing or "amount of
  lowering" issues to deal with in a simple list-of-instructions

These are some of the reasons that I think that the SIMPLE representation
could be made better.  That said, SIMPLE _is_ 90% of the way there, and a
HUGE step up from working on raw language dependent trees or RTL.

> AST representation as it exists now has the following information transformed
> directly from the input source code:
> - control flow
> - def-use
> - caller-callee
> - types

Yes, all of which are very important for one part of the compiler or
another.  See below for how LLVM addresses these.

> Intermediate representations are created for better handling all this information.
> IR can be viewed as sub-languages of the original language:
> - CFG is a the sub-language that contains information about the control flow
> - SSA and dependence graphs for def-use
> - Call-graph for caller-callee
> - symbol tables for types

Very true.

> These IR expose the same information as the original source code but in a more
> compact form to optimizers.

And, more importantly, it should be an easier to manipulate form.  The AST
is an important representation that has to deal with partial information
(for example before type checking/propogation has finished), and
potentially erroneous input.  A well designed mid-level IR doesn't have to
deal with these issues, so the representation used can be simplified a
lot, making it much easier to transform.

> OTOH this information is not ready to be used dirrectly by optimizers since IRs
> can produce different representations for the same semantic-equivalent program.

I'm not really sure what you mean by this.  In my view, it is always best
to eliminate the number of redundancies and number of cases that must be
handled in the representation (ie, by lowering ++X to use primitives, you
don't have to handle the ++ case anymore).  I'm not sure if this is what
you're getting at, but if so, I certainly agree.

> This increase the complexity of optimizers.  The idea is to reduce a class of
> semantic-equivalent programs to a single representative program.  This is the
> normalization process:
> - SIMPLE normalizes def-use IRs
> - goto/break/switch elimination and loop canonicalization aim to normalize
>   control flow.
> - ??? normalization for caller-callee
> - ??? normalization for types
> (I have no idea yet about question marks... work in progress...)

Yes, that is how it currently stands.

>   Low Level Virtual Machine (LLVM) defines a virtual instruction set that
> contains exactly the same amount of information as the AST representation and
> it has all the benefits of the three-address-code normalization.

It actually covers all of the bases you list above pretty well, more
detail below.

>   Thus, I agree that writing optimizers on LLVM is as difficult :-) as writing
> them on trees under SIMPLE form.
>   A good idea I saw in the LLVM is that the SSA infromation is stored directly
> in the operations stream: ie. (from LLVMCompilationStrategy.pdf)
> <result> = phi <type> [<val0>, <label0>], ..., [<valN>, <labelN>]

This example that you picked out is just one of a variety of different
ways that LLVM is designed to simplify optimizer design.  Let me explain
how LLVM represents the information that you listed above:

> - SSA and dependence graphs for def-use

This is the certainly the easiest one to explain.  LLVM does use SSA, and
as the example you showed above illustrates, the SSA information is
reflected __directly__ in the instructions themselves.  This occurs in
even more subtle ways than the PHI node instruction.  Some background is

In our current LLVM implementation, we have the following C++ classes
[you can see the full class heirarchy at the bottom of the page, if you are so

Value  []
  This class is the base class of all things that can be used as an
  operand to an instruction.  For example, there are subclasses
  "Constant" and "Argument" to represent constants and formal parameters
  to functions.  The Value class keeps track of list of "User"'s [below],
  which are all of the objects that use the LLVM value.  This provides the
  def-use chain information.

User  []
  User is a subclass of Value which represents a Value that can refer to
  other LLVM values [for example Instruction, below].  The main feature of
  User is the operands list, which keeps pointers to all of the Value's
  that are used by this User.  This information represents the SSA use-def
  information.  Of course, the C++ implementation provides a nice
  interface that makes it impossible for the use-def & def-use information
  to get out of date...

Instruction []
  Instruction is a subclass of User.  It provides information about the
  opcode of the instruction, and some other helpful instruction type
  stuff, in addition to inheriting functionality from it's base classes.

So so far, you see how the def-use & use-def SSA information is tracked.
The crucial thing to note here though is that _Instruction is a subclass
of Value_.  This means that when you have the following LLVM code:

  %A = add int %X, %Y
  %B = sub int %A, 0

... that there is a pointer in the operands list for the %B instruction
that points directly to the Add instruction itself.  There are no
intermediates or other references.  There are no tables on the side that
are neccesary to update this information.

This makes the representation dramatically simpler to update and
transform.  For example, to optimize away the second instruction there, we
do something similar to the following:

if (B->getOpcode() == Instruction::Sub)
  if (B->getOperand(1) == ConstantZero)

The replaceAllUsesWith method is a member of the Value class
[] that loops over all
of the Users of the value, changing all instances of "this" in their
operand list to instead refer to the argument of replaceAllUsesWith.

Note that this automatically keeps the SSA information up-to-date.  There
is no external tables to reference, reconstruct or update.  This makes is
incredibly easy to write transformations.

> - symbol tables for types

LLVM also directly reflects this _in_ the representation just like it does
the SSA representation.  LLVM is a typed
[] representation,
which requires types exist on all values and requires that certain type
constraints are always true (for example, the two operands to an 'add' or
'sub' instruction must be the same).  The type representation is truly
language independant, by mapping high-level concepts like classes down
into low-level concepts like structures.

Because every LLVM value must have a type, the Value class contains a
Type* field, recording the type of every value.  Because of this, types
are easily and quickly available for use.  This makes "easy things easy",
and also quite efficient:

if (B->getOpcode() == Instruction::Sub)
  if (!B->getType()->isFloatingPoint())   // Be careful with NaN's?
    if (B->getOperand(1) == ConstantZero)

Note that this representation is equivalent to how RTL stores the Mode
information in all of the "values" it uses.  Here, however, the type
system is much more powerful (and useful).  My key point is that this
information is kept IN the representation itself, not along-side in a
table that is:

 A. Slow - in the best case, accessing it is a hash table lookup vs a
    simple pointer indirection
 B. Large - again, hash tables have overhead
 C. Painful - The symbol table must always be kept up-to-date.

> - CFG is a the sub-language that contains information about the control flow

Keeping with my story, LLVM also represents the CFG directly in the
representation, just like everything else we have discussed.  In
particular, there is a subclass of Value named BasicBlock
[].  BasicBlock's
obviously contain a list of instuctions in the block, but they themselves
are also Value's (of Type 'label').  Because of this, 'BasicBlock*'s may
exist right in the Operand list for the instructions themselves, although
there are only a few instructions that can use operands of type 'label'.
An example of some LLVM code:

   br label %bb2
   %reg110 = phi uint [ %reg112, %bb2 ], [ 1, %bb1 ]
   %reg112 = add uint %reg110, 1
   %cond218 = setne uint %reg112, 10
   br bool %cond218, label %bb2, label %bb3

In this short example, basic block "BB2" is an operand for the PHI
instruction and for the two branch instructions.  A pointer _to the basic
block_ itself exists in all of these locations.

This is great, but it doesn't really answer how we get a CFG out of this.
The simplest CFG interface can tell you all of the incoming and outgoing
edges of a block.  To do this, we have iterators that wrap the following
[trivial] analyses:

To get the successors of a block:
  One invariant of LLVM is that every basic block must end with a
  "Terminator" instruction
  [].  Terminator
  instructions are basically control flow instructions
  (conditional/unconditional branch, return, plus a few others to be fully
  general).  These instructions know all possible destinations that they
  may branch to, and therefore expose the "successor" interface used by
  the succ_iterator.

To get predecessors of a block:
  Because BasicBlock is a subclass of Value, it automatically keeps track
  of all of the User's of the block.  There are users of a basic block
  only in the following cases:
    1. A phi instruction, these can be ignored.
    2. Terminator instructions.  In this case, the basic block the
       terminator lives in is a predecessor.

  Thus, the pred_iterator basically loops over the users of the basic
  block, looking for terminators instructions.

This just leaves one feature lacking from the CFG support:

>   However LLVM does not provide support for high level loop constructs
> (nested loops for example).  It defines instead branch "br" instructions.
> You then need extra informations for loop nest optimizers.

That is very true.  In fact we intentionally do not provide these
facilities as part of LLVM itself, for several reasons (below).  To
provide LoopInfo [], we
use the LLVM Pass infrastructure (described in detail in: to automate the
construction and deconstruction of loop information as neccesary.  Our
"loop information" simply identifies Natural loops and their nesting

There are several important aspects of loop information that make it
different from other parts of the representation, warranting different
treatment [instead of being part of the representation itself, it gets
treated like alias analysis information or dominator information]:

1. Updating loop information is not trivial.  We want to make it easy to
   write transformations and analyses, to allow explorative programming.
   After a transformation is mature and finished, "optimizations" like
   incrementally updating loop information can be implemented if desired.
   If not implemented, however, the loop information is simply invalidated
   and recalculated as neccesary.  Details are in the "how to write a
   pass" document.

2. Most transformations don't need it.  While it is certainly useful for a
   class of transformations (loop transformations for example, LICM for
   another), there are many that don't need it.  Making all of the
   transformations aware of all of the possible analyses is not a good

The other option is to add a "Loop" construct to the intermediate
representation itself.  I'll try to explain why I think this is a bad

1. A low-level representation should be low-level.  This breaks the whole
   notion of a flat representation (in LLVMs case), and adds a special
   case that must be handled by anything that changes or analyzes the CFG.

2. A Loop construct is inherently limited by one of two things:
    A. It is not general enough to represent all of the possible loops
       that are able to be handled by a transformation.  For example, if
       you only support fortran style "DO" loops, then a recusive list
       prefetching transformation will have absolutely no use for it.
    B. It is TOO general.  Upon realizing that "A" has set in, you might
       make the Loop construct a lot more general.  For example, first you
       add the capability to increment the induction variable by a loop
       invariant amount, then you add the ability to not _have_ an
       induction variable at all.  Perhaps you recognize and represent
       irreducible loops as well or something.

       If you go down this path, then the Loop construct becomes
       meaningless.  Transformations become as complex as if they detected
       the loops themselves.  Representing general natural loops (which
       can have multiple back edges, for example) becomes a really nasty
       thing to represent as a "Loop" node, so the real generality desired
       is never implemented.

3. Cluttering up an intermediate representation with something that MAY be
   useful to some transformations is always a bad idea. It's better to
   focus on specific things to represent that are useful to nearly all
   transformations & code generation.  Representing loop information in
   the IR doesn't give you any advantage to representing it on the side,
   and it forces all clients of the IR to be aware of it.

> - Call-graph for caller-callee

This email is already WAY too long, so I'll cut this short.  The basic
jist of how LLVM represents this is pretty simple.  Function's
[] contain a list of
BasicBlock's, and themselves derive from Value.  Because of this, they
have def-use information just like everything else in LLVM.  The LLVM
'Call' instruction always takes a pointer to a function as an operand.
In the case of a direct call, then, this operand is a direct pointer to
the Function object itself.

For almost all purposes, this is a sufficient representation for
transformations that need "call graph" information [for example, inlining
does not need anything else].  No symbol-table lookup or other side
information is neccesary to represent the caller-callee information.  In
more complex cases (for example, a transformation that does stuff with
indirect calls), the transformation can request
[] a
call graph object []
that contains this information.

This covers the most common case quite nicely, and has other advantages as
well [it's trivial to find the calls sites for a particular function,

> However I think that introducing yet another
> translation from trees to LLVM could be expensive in compiling time (yet it is
> a quick transformation from SIMPLE).

In my mind, one of two things would make sense:

1. Change SIMPLE to incorporate some of the better ideas from LLVM, while
   leaving stuff people don't like alone.
2. Replace SIMPLE with LLVM.  If LLVM looks nice, and having a mature C++
   implementation would also help, I certainly would not be opposed to
   this.  I'm sure many people would though, and I'm explicitly not
   pushing for this.

I do not, however, think that it makes sense to have both a "SIMPLE" and
an "LLVM" phase to the compiler, there is almost complete overlap between
the functionality provided, so there should be only one.

Do you have any idea what kinds of things are possible in simple that you
don't think would be possible (or pleasant) in LLVM?  I'd like to get an
idea if there is something that would be hard to do in LLVM, but is easy
in SIMPLE (to figure out how much overlap there is).

I'm really sorry for the length of this mail, but I think it helps explain
a bit of the philosophy behind LLVM...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]