This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [tree-ssa] AST optimizer in C++?
- From: Chris Lattner <sabre at nondot dot org>
- To: Pop Sébastian <pop at gauvain dot u-strasbg dot fr>
- Cc: <dnovillo at redhat dot com>, <gcc at gcc dot gnu dot org>
- Date: Sun, 25 Aug 2002 16:36:59 -0500 (CDT)
- Subject: Re: [tree-ssa] AST optimizer in C++?
> > with it. Hopefully I can convince you that tree's are not really
> > neccesary at all. :) :)
> Agree, since we deal with (more or less) the same amount of information
> structured differently: trees vs. list of instructions.
Very true. Although probably well known, lists of instructions have lots
of advantages as well (over trees):
* Better cache behavior: Tree representations tend to have lots of
dynamically allocated nodes to represent the equivalent amount of
information held by a single "instruction". For example, a single "add"
instruction can be represented with a single dynamically allocated node
in list-of-instructions form, but requires at least 3 (and possibly
more) nodes in a tree representation: One of the instruction, and two
for the operands.
* Smaller representation: Side effect of the above... with less
allocations/fewer nodes, the total of the per-node overheads is much
less.
* Simpler to deal with: There are _no_ sharing/unsharing or "amount of
lowering" issues to deal with in a simple list-of-instructions
representation.
These are some of the reasons that I think that the SIMPLE representation
could be made better. That said, SIMPLE _is_ 90% of the way there, and a
HUGE step up from working on raw language dependent trees or RTL.
> AST representation as it exists now has the following information transformed
> directly from the input source code:
> - control flow
> - def-use
> - caller-callee
> - types
Yes, all of which are very important for one part of the compiler or
another. See below for how LLVM addresses these.
> Intermediate representations are created for better handling all this information.
> IR can be viewed as sub-languages of the original language:
> - CFG is a the sub-language that contains information about the control flow
> - SSA and dependence graphs for def-use
> - Call-graph for caller-callee
> - symbol tables for types
Very true.
> These IR expose the same information as the original source code but in a more
> compact form to optimizers.
And, more importantly, it should be an easier to manipulate form. The AST
is an important representation that has to deal with partial information
(for example before type checking/propogation has finished), and
potentially erroneous input. A well designed mid-level IR doesn't have to
deal with these issues, so the representation used can be simplified a
lot, making it much easier to transform.
> OTOH this information is not ready to be used dirrectly by optimizers since IRs
> can produce different representations for the same semantic-equivalent program.
I'm not really sure what you mean by this. In my view, it is always best
to eliminate the number of redundancies and number of cases that must be
handled in the representation (ie, by lowering ++X to use primitives, you
don't have to handle the ++ case anymore). I'm not sure if this is what
you're getting at, but if so, I certainly agree.
> This increase the complexity of optimizers. The idea is to reduce a class of
> semantic-equivalent programs to a single representative program. This is the
> normalization process:
> - SIMPLE normalizes def-use IRs
> - goto/break/switch elimination and loop canonicalization aim to normalize
> control flow.
> - ??? normalization for caller-callee
> - ??? normalization for types
> (I have no idea yet about question marks... work in progress...)
Yes, that is how it currently stands.
> Low Level Virtual Machine (LLVM) defines a virtual instruction set that
> contains exactly the same amount of information as the AST representation and
> it has all the benefits of the three-address-code normalization.
It actually covers all of the bases you list above pretty well, more
detail below.
> Thus, I agree that writing optimizers on LLVM is as difficult :-) as writing
> them on trees under SIMPLE form.
...
> A good idea I saw in the LLVM is that the SSA infromation is stored directly
> in the operations stream: ie. (from LLVMCompilationStrategy.pdf)
> <result> = phi <type> [<val0>, <label0>], ..., [<valN>, <labelN>]
This example that you picked out is just one of a variety of different
ways that LLVM is designed to simplify optimizer design. Let me explain
how LLVM represents the information that you listed above:
> - SSA and dependence graphs for def-use
This is the certainly the easiest one to explain. LLVM does use SSA, and
as the example you showed above illustrates, the SSA information is
reflected __directly__ in the instructions themselves. This occurs in
even more subtle ways than the PHI node instruction. Some background is
neccesary.
In our current LLVM implementation, we have the following C++ classes
[you can see the full class heirarchy at the bottom of the
http://llvm.cs.uiuc.edu/doxygen/inherits.html page, if you are so
inclined]:
Value [http://llvm.cs.uiuc.edu/doxygen/classValue.html]
This class is the base class of all things that can be used as an
operand to an instruction. For example, there are subclasses
"Constant" and "Argument" to represent constants and formal parameters
to functions. The Value class keeps track of list of "User"'s [below],
which are all of the objects that use the LLVM value. This provides the
def-use chain information.
User [http://llvm.cs.uiuc.edu/doxygen/classUser.html]
User is a subclass of Value which represents a Value that can refer to
other LLVM values [for example Instruction, below]. The main feature of
User is the operands list, which keeps pointers to all of the Value's
that are used by this User. This information represents the SSA use-def
information. Of course, the C++ implementation provides a nice
interface that makes it impossible for the use-def & def-use information
to get out of date...
Instruction [http://llvm.cs.uiuc.edu/doxygen/classInstruction.html]
Instruction is a subclass of User. It provides information about the
opcode of the instruction, and some other helpful instruction type
stuff, in addition to inheriting functionality from it's base classes.
So so far, you see how the def-use & use-def SSA information is tracked.
The crucial thing to note here though is that _Instruction is a subclass
of Value_. This means that when you have the following LLVM code:
%A = add int %X, %Y
%B = sub int %A, 0
... that there is a pointer in the operands list for the %B instruction
that points directly to the Add instruction itself. There are no
intermediates or other references. There are no tables on the side that
are neccesary to update this information.
This makes the representation dramatically simpler to update and
transform. For example, to optimize away the second instruction there, we
do something similar to the following:
if (B->getOpcode() == Instruction::Sub)
if (B->getOperand(1) == ConstantZero)
B->replaceAllUsesWith(B->getOperand(0));
The replaceAllUsesWith method is a member of the Value class
[http://llvm.cs.uiuc.edu/doxygen/classValue.html#a9] that loops over all
of the Users of the value, changing all instances of "this" in their
operand list to instead refer to the argument of replaceAllUsesWith.
Note that this automatically keeps the SSA information up-to-date. There
is no external tables to reference, reconstruct or update. This makes is
incredibly easy to write transformations.
> - symbol tables for types
LLVM also directly reflects this _in_ the representation just like it does
the SSA representation. LLVM is a typed
[http://llvm.cs.uiuc.edu/docs/LangRef.html#typesystem] representation,
which requires types exist on all values and requires that certain type
constraints are always true (for example, the two operands to an 'add' or
'sub' instruction must be the same). The type representation is truly
language independant, by mapping high-level concepts like classes down
into low-level concepts like structures.
Because every LLVM value must have a type, the Value class contains a
Type* field, recording the type of every value. Because of this, types
are easily and quickly available for use. This makes "easy things easy",
and also quite efficient:
if (B->getOpcode() == Instruction::Sub)
if (!B->getType()->isFloatingPoint()) // Be careful with NaN's?
if (B->getOperand(1) == ConstantZero)
B->replaceAllUsesWith(B->getOperand(0));
Note that this representation is equivalent to how RTL stores the Mode
information in all of the "values" it uses. Here, however, the type
system is much more powerful (and useful). My key point is that this
information is kept IN the representation itself, not along-side in a
table that is:
A. Slow - in the best case, accessing it is a hash table lookup vs a
simple pointer indirection
B. Large - again, hash tables have overhead
C. Painful - The symbol table must always be kept up-to-date.
> - CFG is a the sub-language that contains information about the control flow
Keeping with my story, LLVM also represents the CFG directly in the
representation, just like everything else we have discussed. In
particular, there is a subclass of Value named BasicBlock
[http://llvm.cs.uiuc.edu/doxygen/classBasicBlock.html]. BasicBlock's
obviously contain a list of instuctions in the block, but they themselves
are also Value's (of Type 'label'). Because of this, 'BasicBlock*'s may
exist right in the Operand list for the instructions themselves, although
there are only a few instructions that can use operands of type 'label'.
An example of some LLVM code:
br label %bb2
bb2:
%reg110 = phi uint [ %reg112, %bb2 ], [ 1, %bb1 ]
%reg112 = add uint %reg110, 1
%cond218 = setne uint %reg112, 10
br bool %cond218, label %bb2, label %bb3
bb3:
In this short example, basic block "BB2" is an operand for the PHI
instruction and for the two branch instructions. A pointer _to the basic
block_ itself exists in all of these locations.
This is great, but it doesn't really answer how we get a CFG out of this.
The simplest CFG interface can tell you all of the incoming and outgoing
edges of a block. To do this, we have iterators that wrap the following
[trivial] analyses:
To get the successors of a block:
One invariant of LLVM is that every basic block must end with a
"Terminator" instruction
[http://llvm.cs.uiuc.edu/doxygen/classTerminatorInst.html]. Terminator
instructions are basically control flow instructions
(conditional/unconditional branch, return, plus a few others to be fully
general). These instructions know all possible destinations that they
may branch to, and therefore expose the "successor" interface used by
the succ_iterator.
To get predecessors of a block:
Because BasicBlock is a subclass of Value, it automatically keeps track
of all of the User's of the block. There are users of a basic block
only in the following cases:
1. A phi instruction, these can be ignored.
2. Terminator instructions. In this case, the basic block the
terminator lives in is a predecessor.
Thus, the pred_iterator basically loops over the users of the basic
block, looking for terminators instructions.
This just leaves one feature lacking from the CFG support:
> However LLVM does not provide support for high level loop constructs
> (nested loops for example). It defines instead branch "br" instructions.
> You then need extra informations for loop nest optimizers.
That is very true. In fact we intentionally do not provide these
facilities as part of LLVM itself, for several reasons (below). To
provide LoopInfo [http://llvm.cs.uiuc.edu/doxygen/classLoopInfo.html], we
use the LLVM Pass infrastructure (described in detail in:
http://llvm.cs.uiuc.edu/docs/WritingAnLLVMPass.html) to automate the
construction and deconstruction of loop information as neccesary. Our
"loop information" simply identifies Natural loops and their nesting
properties.
There are several important aspects of loop information that make it
different from other parts of the representation, warranting different
treatment [instead of being part of the representation itself, it gets
treated like alias analysis information or dominator information]:
1. Updating loop information is not trivial. We want to make it easy to
write transformations and analyses, to allow explorative programming.
After a transformation is mature and finished, "optimizations" like
incrementally updating loop information can be implemented if desired.
If not implemented, however, the loop information is simply invalidated
and recalculated as neccesary. Details are in the "how to write a
pass" document.
2. Most transformations don't need it. While it is certainly useful for a
class of transformations (loop transformations for example, LICM for
another), there are many that don't need it. Making all of the
transformations aware of all of the possible analyses is not a good
idea.
The other option is to add a "Loop" construct to the intermediate
representation itself. I'll try to explain why I think this is a bad
idea:
1. A low-level representation should be low-level. This breaks the whole
notion of a flat representation (in LLVMs case), and adds a special
case that must be handled by anything that changes or analyzes the CFG.
2. A Loop construct is inherently limited by one of two things:
A. It is not general enough to represent all of the possible loops
that are able to be handled by a transformation. For example, if
you only support fortran style "DO" loops, then a recusive list
prefetching transformation will have absolutely no use for it.
B. It is TOO general. Upon realizing that "A" has set in, you might
make the Loop construct a lot more general. For example, first you
add the capability to increment the induction variable by a loop
invariant amount, then you add the ability to not _have_ an
induction variable at all. Perhaps you recognize and represent
irreducible loops as well or something.
If you go down this path, then the Loop construct becomes
meaningless. Transformations become as complex as if they detected
the loops themselves. Representing general natural loops (which
can have multiple back edges, for example) becomes a really nasty
thing to represent as a "Loop" node, so the real generality desired
is never implemented.
3. Cluttering up an intermediate representation with something that MAY be
useful to some transformations is always a bad idea. It's better to
focus on specific things to represent that are useful to nearly all
transformations & code generation. Representing loop information in
the IR doesn't give you any advantage to representing it on the side,
and it forces all clients of the IR to be aware of it.
> - Call-graph for caller-callee
This email is already WAY too long, so I'll cut this short. The basic
jist of how LLVM represents this is pretty simple. Function's
[http://llvm.cs.uiuc.edu/doxygen/classFunction.html] contain a list of
BasicBlock's, and themselves derive from Value. Because of this, they
have def-use information just like everything else in LLVM. The LLVM
'Call' instruction always takes a pointer to a function as an operand.
In the case of a direct call, then, this operand is a direct pointer to
the Function object itself.
For almost all purposes, this is a sufficient representation for
transformations that need "call graph" information [for example, inlining
does not need anything else]. No symbol-table lookup or other side
information is neccesary to represent the caller-callee information. In
more complex cases (for example, a transformation that does stuff with
indirect calls), the transformation can request
[http://llvm.cs.uiuc.edu/docs/WritingAnLLVMPass.html#getAnalysisUsage] a
call graph object [http://llvm.cs.uiuc.edu/doxygen/classCallGraph.html]
that contains this information.
This covers the most common case quite nicely, and has other advantages as
well [it's trivial to find the calls sites for a particular function,
etc].
> However I think that introducing yet another
> translation from trees to LLVM could be expensive in compiling time (yet it is
> a quick transformation from SIMPLE).
In my mind, one of two things would make sense:
1. Change SIMPLE to incorporate some of the better ideas from LLVM, while
leaving stuff people don't like alone.
2. Replace SIMPLE with LLVM. If LLVM looks nice, and having a mature C++
implementation would also help, I certainly would not be opposed to
this. I'm sure many people would though, and I'm explicitly not
pushing for this.
I do not, however, think that it makes sense to have both a "SIMPLE" and
an "LLVM" phase to the compiler, there is almost complete overlap between
the functionality provided, so there should be only one.
Do you have any idea what kinds of things are possible in simple that you
don't think would be possible (or pleasant) in LLVM? I'd like to get an
idea if there is something that would be hard to do in LLVM, but is easy
in SIMPLE (to figure out how much overlap there is).
I'm really sorry for the length of this mail, but I think it helps explain
a bit of the philosophy behind LLVM...
-Chris
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/