This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: AST optimizer in C++?
- From: Chris Lattner <sabre at nondot dot org>
- To: Pop Sébastian <pop at gauvain dot u-strasbg dot fr>
- Cc: <gcc at gcc dot gnu dot org>
- Date: Fri, 23 Aug 2002 12:28:53 -0500 (CDT)
- Subject: Re: AST optimizer in C++?
> > As long as we are plugging compiler infrastructures, I thought I might
> > mention my own project LLVM: http://llvm.cs.uiuc.edu/
> I had a look at the project, and I'm interested to have a look at the sources
> (if it's possible).
As I says on the web page, we aren't really ready for a public "release"
yet. That said, the doxygen documentation is all online, so you are free
to browse the source:
http://llvm.cs.uiuc.edu/doxygen/
I'll get back to the list this afternoon with something more decisive
about making tarballs public.
> > If you have any questions about LLVM or would like to discuss compiler
> > architecture, I'd be more than happy to participate.
> After what I (quickly) read from the web page you use gcc's front-ends.
> My questions are:
> - Do you have a class hierarchy for representing statements and expressions?
Sorta. If you take a look at this page:
http://llvm/doxygen/hierarchy.html or http://llvm/doxygen/inherits.html
you can see the (textual or graphical) class heirarchy for entire compiler
suite. Search for "Value" and you can see the root of the tree you're
interested in.
Now this is a "sorta" because we don't have statements and expressions.
:) LLVM is a low level three address code type representation, in SSA
form, with _type information_ for all operands. One of our research
results has been to show that most of the optimizations traditionally done
on AST's can be done just as well on low level representations, *as long
as you have rich type information*. In the context of LLVM then, this
means that statements and expressions are both instructions, it's just
that "statements" generally return void, ie they don't produce a value.
If you'd like me to turn some C code into LLVM code to be more concrete,
I'd be more than happy to. It really is a nice representation for
implementating optimizations (see the subclasses of the "Pass" class in
the inheritance diagram for some of the transformations we have
implemented). There is quite a bit of documentation on the web page (and
it's growing) talking about the LLVM representation and compiler process.
We are just starting to work on the "intro to coding LLVM stuff"
documents unfortunately.
> - Is it possible to generate GCC trees after your optimizations? (and then
> generate RTL by passing these trees to the RTL-translator...)
The basic architecture looks like this (this is covered in detail in this
document: http://llvm.cs.uiuc.edu/pubs/LLVMCompilationStrategy.pdf):
The cc1 process:
GCC Parser -> Tree -> RTL -> RTLSSA -> RTLSSA2LLVM, output x.s (LLVM assembly)
The 'as' process:
parse x.s -> cleanup -> level raise -> optimize, output x.o (which is in LLVM bytecode)
the 'ld' process:
link *.o bytecode files, generate native code for sparc (postlink), output
x.s (sparc assembly)
Run native sparc assembler/linker. We also have a simple C backend as
well.
There are a couple of interesting things to note about this process:
1. We completely discard all of the type information provided to use by
the source language, and reconstruct what we need from the RTL.
This is a side-effect of us starting with a GCC snapshot from about 2
months before GCC 3.0 became official. With no tree-ssa branch to
start from, we decided to base our work around the RTL representation
(which was also important, because not every frontend used the Tree
representation). Although we currently only support the C frontend, we
plan to turn on the C++ frontend soon.
2. We have disabled almost all of the optimizations in GCC (basically all
the RTL optimizations, keeping the language specific optimizations,
like builtin optimizations). This is because the optimizations in the
GCC frontend were taking a lot more time to execute than they did when
reimplemented making use of the SSA and type information.
Additionally, some optimizations were only applied by the CSE pass,
which is limited to extended basic blocks, whereas all of our
optimizations are at least global. To be specific, compiling vecffe.c
from the SpecINT2000 GAP benchmark (a big file) gives the following
-ftime-report output when running with -O2 (showing that most
everything is disabled):
Execution times (seconds)
preprocessing : 0.31 ( 7%) usr 0.05 (25%) sys 0.53 ( 9%) wall
lexical analysis : 0.21 ( 4%) usr 0.03 (15%) sys 0.27 ( 5%) wall
parser : 1.05 (22%) usr 0.06 (30%) sys 1.27 (22%) wall
varconst : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
jump : 0.27 ( 6%) usr 0.00 ( 0%) sys 0.30 ( 5%) wall
if-conversion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
convert to SSA : 0.65 (14%) usr 0.03 (15%) sys 0.70 (12%) wall
convert from SSA : 0.82 (17%) usr 0.01 ( 5%) sys 1.13 (19%) wall
final : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
rest of compilation : 0.32 ( 7%) usr 0.01 ( 5%) sys 0.36 ( 6%) wall
RTL->LLVM code gen : 0.44 ( 9%) usr 0.00 ( 0%) sys 0.47 ( 8%) wall
LLVM ResolveOperands : 0.32 ( 7%) usr 0.00 ( 0%) sys 0.37 ( 6%) wall
LLVM Code Emission : 0.25 ( 5%) usr 0.01 ( 5%) sys 0.36 ( 6%) wall
TOTAL : 4.70 0.20 5.90
Note that the 3 LLVM passes happen between the convert to/from SSA
passes. toplev.c is pretty nastily hacked to disable this stuff (which
could certainly be fixed, of course). This is a debug build running on
slow machine, so the actual timing results shouldn't be taken to mean
anything. In particular, the 3 LLVM passes are not really written to be
efficient, and should be rewritten at some point.
3. We do in fact need a .md file for our target to specify what sort of
RTL to generate. Basically we just have completely generic operations,
with as few constraints as possible on the instructions, trying to get
unobfuscated code from the frontend. For example, our andhi3 pattern
looks like:
(define_insn "andhi3"
[(set (match_operand:HI 0 "register_operand" "=r")
(and:HI (match_operand:HI 1 "nonmemory_operand" "")
(match_operand:HI 2 "nonmemory_operand" "")))]
""
"andhi3 %1, %2, %0")
if anyone cares, I can make our GCC changes available, although it's
kindof a hack (let's just say I didn't know GCC very well when I
started :), and needs to be largely rewritten.
4. Keeping LLVM code until the link step lets us perform important
interprocedural optimizations at link time, for example alias analysis,
that have a big effect on machine code generation (for example,
scheduling). Also inlining, and many other optimizations are much more
powerful at link-time, as I'm sure everyone is keenly aware. To keep
the actual code generation step fast, we do as much optimization as
possible in the 'gccas' process as possible, including machine specific
optimization.
5. LLVM allows a full suite of optimizations on global objects as well as
the bodies of functions. This means it can completely eliminate unused
global variables, functions, etc, without problem. I believe that GCC
cannot currently do this in the backend (as part of generic
optimization) because it treats global variables, for example, as
immutable places in memory accessed by symbol_refs.
6. The LLVM optimizations are very fast (I can post numbers if you'd
like). Typically it takes as much time to run through cc0 as it does
to execute gccas. Although we do not have ALL of the optimizations
we'd like implemented for GCCAS, we do have a fair collection, executed
in this order:
* FunctionResolving - Resolves a call to 'int foo(...)' to 'int foo(int)'
if there wasn't a prototype for foo specified.
* globaldce - Remove unused static global variables &
functions, also removes unused 'externs'.
* deadtypeelimination - Cleanup the symbol table by removing unused
typedefs.
* constantmerge - Merge global variable constants (basically
.rodata) with identical contents together.
* deadinst elimination - Trivial dead code elimination
* raise allocations - Convert malloc/free calls to malloc/free instructions
* indvar simplify - Convert indvars to cannonical form
* Level raise - Recover type information
* promote mem2reg - Promote stack values to SSA registers.
* instcombining - Generalized peephole, val# pass, very fast.
* licm - Loop invariant code motion
* gcse - Global Common Subexpression Elimination
* sccp - Sparse conditional constant prop
* instcombine - Rerun value simplification
* adce - Aggressive dead code elimination
* simplifycfg - Merge and delete blocks.
These use various analyses to do their job. If you're interested, I can
also provide timing information for stuff if you'd like.
There are a couple of additional transformations that I would like added
to this collection: PRE, strength reduction, etc. They are being added
when I have time.
> In fact what I want for gcc is an AST optimizer that could be built independently
> of gcc's front-ends. This optimizer takes as input SIMPLE trees and its output
> is again gcc trees (a subset of SIMPLE, or even a superset of SIMPLE if we decide
> to promote some stmts/exprs to a higher level during optimization).
LLVM could certainly be used for something like this, but I think it would
fit better as an intermediate representation between SIMPLE (or generic
trees) and RTL. The current expansion infrastructure in GCC that expands
from tree to RTL could easily be adapted to convert from LLVM, and RTL
certainly doesn't need the type information available in LLVM. *shrug*
I'm not sure what converting back to trees would give us, although it's
probably possible.
> The overall schema could be:
>
> (GCC front-ends) -> (GENERIC trees) -> [(SIMPLE trees) -> (AST Optimizer)] ->
> -> (RTL conversion) -> (Code generation)
There are many ways to do it, I think it would work best as:
(GCC front-ends) -> (GENERIC trees) -> LLVM -> [optimiations] -> (RTL conversion)
-> (Code generation)
but a mix of approaches would certainly have advantages and disadvantages.
Even if GCC was not interested in using LLVM itself as a component (which
I really don't expect), some of the ideas could be incorporated into the
SIMPLE representation to get some of the advantages.
What exactly is gained by converting back to a tree representation
afterwards? Just simplified integration with the compiler as it stands
now?
> with optional components in "-> [...] ->". These components are not mandatory
> for building the GCC compiler. They could be built apart once the g++ compiler
> is available. The AST optimizer could be loaded from a .so file (as in open64
> compiler) and thus avoiding to grow too much the size of the executable when
> AST optimization is not needed.
Sure, makes sense. There are two ways I see to completely get LLVM out of
the pictures when building from an untrusted compiler:
1. Have the simplest LLVM components be written in C, including the
LLVM->RTL conversion.
2. Convert directly from generic trees to RTL when bootstrapping.
I would advocate #2 personally, simply because it has already been done
before in the past and could easily be resurrected. The only problem
would be that two code paths would have to be maintained: Tree->RTL and
Tree->LLVM. Although not a huge deal (a lot of stuff could be factored
out as LLVM and RTL are both "low level" representations), it's still
suboptimal.
Anyway, I'd love to discuss various IR design issues. If anyone would
like to talk about the Pros/cons of optimizing on a tree vs a low-level
IR with type information, I can certainly talk about the experience I have
with it. Hopefully I can convince you that tree's are not really
neccesary at all. :) :)
-Chris
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/