This is the mail archive of the
mailing list for the GCC project.
Proposal for a 'gcc -O4': interprocedural optimization
- From: Chris Lattner <sabre at nondot dot org>
- To: <gcc at gcc dot gnu dot org>
- Date: Fri, 23 Aug 2002 17:34:43 -0500 (CDT)
- Subject: Proposal for a 'gcc -O4': interprocedural optimization
One aspect of GCC that is not very well developed are capabilities for
interprocedural analysis and optimization (especially across translation
unit boundaries). Here I describe one reasonable (and tested) approach to
adding this to GCC, which would be invoked when compiling at a level of
"-O4" (or perhaps some other interface, the actual command line option
doesn't really matter). At the end of this email there is some discussion
of how this interacts with the established momentum in the GCC community.
Currently, the GCC pipeline (on the tree-ssa branch) looks like this:
LanguageSpecificAST SIMPLE trees RTL
parser -> simplifier -> optimizer -> expander -> RTL Opt -> machine code
Essentially, a front-end generates a language specific version of an AST
for the file being compiled, performs language specific optimizations on
it, and runs it through a language specific simplifier. The output of the
simplifier is a common language, currently SIMPLE. Once in the language
independant SIMPLE form, it is optimized, converted to RTL, optimized in
RTL form, then emitted as machine code. The machine code is assembled and
linked effectively by the system utilities.
I propose that when compiling with "-O4" enabled that the compiler
be reduced into this:
LanguageSpecificAST SIMPLE trees
parser -> simplifier -> optimizer -> "simple code"
... preserving higher level information (than RTL or machine code that is)
until link time. The .o files generated by the modified compiler would
not be populated with machine code, but instead would contain a serialized
form of the SIMPLE trees (or some other representation).
The linker then would look like this::
"simple code" -\
"simple code" -/ SIMPLE link -> IP optimize -> expander -> RTL opt ->
machine code -> assembler--\
object code ------------------------------------ native link
native library ---------------------------------/
In other words, we preserve as much code as possible in SIMPLE form until
link time where we can do more aggressive interprocedural optimizations
(for example, real IP alias analysis & inlining). After these
optimizations are performed, the code is compiled with the traditional GCC
backend just as it is today.
As mentioned in another thread, my group has already done this (with the
exception of reusing the GCC backend for native code generation), so this
is not a pie-in-the-sky idea, and can yield excellent performance for the
There are a few things that are key to making this work:
1. The representation chosen has to be low level enough that a large part
of the traditional optimizations can be performed at compile time, to
avoid really expensive relink times. SIMPLE fits well here.
2. Ideally the representation should be cleanly defined so that there is
a well defined mapping between all of the front-ends involved. I
believe this is already a goal with the SIMPLE work.
3. The representation should have a fast way to serialize and deserialize
it. At present, I think that there isn't really a good way for this
to happen with SIMPLE, although it may be possible to add this in the
future. Currently, if I understand correctly, the in-memory
representation is effectively serialized onto disk, which does not
seem like a very future-proof or structured format for storing code.
I think that adding this ability to GCC would make a lot of the user base
happy. Faster executables are always good, and there are often requests
for adding interprocedural optimizations to GCC (mostly from academic type
people that are not very familiar with the GCC architecture).
So I promised to talk about current momentum in the GCC sourcebase:
The tree-ssa/ast-optimizer branch people seem to be doing almost all of
the work required to make this possible already: They are working on
reimplementing optimizations on the SIMPLE representation so that they can
be done more efficiently or better than on RTL. RTL is wonderful for
machine specific optimization and very late code generation, but is not a
wonderful target for mid-level optimization.
The pfe/precompiled header work is working on one aspect of this work, and
it has been proposed in the past that it be used for interprocedural
optimization: basically they provide a mechanism for serializing the
in-memory representation of the tree data structures at any given time.
This can also be used to serialize the tree representations for entire
functions at a time, to be used for interprocedural optimization at link
time. The way my proposal differs from this possible strategy is that it
is a bit more structured and is more likely to be implementable in
practice. This is open for discussion of course.
As I mentioned, or group has a lot of experience with this kind of thing
and a good sized library of efficient optimizations... I would really like
to see GCC get some of the kickbacks from our work, especially if it makes
GCC a more powerful and useful compiler.
If this was implemented, how likely is it to run into political problems
that could prevent it from being accepted into the official GCC CVS tree?
Are there technical details I should expound upon?