This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
[RFC] New intermediate languages
- To: gcc at gcc dot gnu dot org
- Subject: [RFC] New intermediate languages
- From: Diego Novillo <dnovillo at redhat dot com>
- Date: Thu, 1 Feb 2001 01:55:01 -0500
- Organization: Red Hat Canada
I am working on some infrastructure changes to introduce multiple
intermediate languages in the compiler. The primary goal is to
have a gradual IL lowering scheme to allow the implementation of
analysis and optimizing transformations at the appropriate level
of abstraction.
Essentially, I want to remove the tree<->RTL interface and
introduce one or more intermediate representations in between the
two. In addition, transformations that are currently done at the
RTL level should be moved (or replicated) to higher level ILs
(eg, LCM, constant propagation, control flow transformations,
loop transformations).
An important design consideration is to minimize the impact these
changes are going to make to the rest of the compiler. This is
not something that will happen overnight, obviously. These
changes are very pervasive and introducing them in one monstrous
patch would be impossible.
As a first step, I am implementing an SSA representation for
trees. Although trees may not be the ideal IL for an SSA
representation, they do have sufficient information to get the
job done: we can build a flowgraph and find def/uses of program
variables. The implementation does the following (so far I've
only dealt with the C front-end):
1- In c_expand_body() we insert a call to the tree optimizer
before calling expand_stmt().
2- The tree optimizer walks the tree representation of the
function looking for defs/uses of scalar variables (I'm
avoiding array/structure references for the time being).
Defs/uses are stored in a separate data structure with
pointers into the trees.
3- Use find_basic_blocks to build a flowgraph. Here we need to
make a few changes. Currently, flow.c assumes that basic
blocks contain RTL.
I am making changes so that basic blocks contain a union of
trees and RTL, and the generic flow routines use callbacks for
IL specific things like counting basic blocks and finding
block boundaries. Although I'm not done yet, it doesn't seem
that I'll need to change too many things. Once I have this I
can re-use all the CFG related functions with impunity.
4- Building the SSA form can be done without modifying the trees.
I don't think I want to try and re-use the current SSA code, I
do not believe in SSA re-writing the program. It's better to
overlay the SSA information as a web of pointers on top of the
CFG.
Things I see happening at this level are control flow clean-ups,
constant propagation (SCC), PRE, and things like that. Using a
similar approach we can build a loop oriented IL to do array and
loop transformations. Trees already seem to contain much of what
we need.
I realize that we are gearing toward GCC3.0 so I don't expect to
be introducing these changes anytime soon. However, I should have
patches ready in the next few days if anyone is interested.
Nothing is set in stone, new ideas and patches are most welcome.
Another idea is to start implementing a mid-level RTL
representation (MRTL). The key idea is to reduce target ties by
using canonical RTL patterns for everything needed by the
tree->RTL transformation pass. This is more of a long-term thing.
Currently the tree<->RTL interface is a bit too intertwined.
These are some of the general properties of MRTL.
- Reduced target dependencies. The RTL patterns generated at this
stage are fixed and have well-known semantics. Some
architectural features will likely creep in, like word size,
number of physical registers and whatnot.
- No information loss in the tree->MRTL transition. After the
MRTL form is transformed it will not be possible to go back
to the tree information.
- Control flow representation is similar to RTL. No structured
control-flow, everything is represented with jumps and
compare/branch.
- Arrays are lowered to pointer arithmetic and structures
flattened. No libcalls are generated for floating point and/or
integer arithmetic.
These patterns would be generated directly from stmt.c/expr.c.
The plan is to write two passes tree->MRTL and MRTL->RTL which
should not change existing behaviour. Once this is done, I would
start building a CFG and an SSA form for MRTL (this can be mostly
ripped off the current implementation). At that point we can
start porting scalar transformations to the new form.
Data types
----------
Use the same modes defined in RTL.
Control Statements
------------------
(insn)
(mcall (addr) (arg1) (arg2) ... (argN))
This is a new pattern for function calls. Alternatively we
could define an abstract stack machine and expand calls using
the existing code. I'd rather see a more compact
representation, though.
(jump_insn (set (pc) (expr)))
Standard jump. expr can be either a label_ref or an
if_then_else for unconditional and conditional jumps.
(return)
(return X)
Standard return with and without a value.
(code_label)
(barrier)
(note NOTE_*)
Expressions
-----------
Constants:
(const_int i)
(const_double:m addr i0 i1)
(const_string str)
(symbol_ref:m symbol)
(label_ref label)
Registers/memory:
(mem:m addr alias)
(reg:m n)
(pc)
(set lval x)
Arithmetic
(plus:m x y)
(minus:m x y)
(compare:m x y)
(neg:m x)
(mult:m x y)
(div:m x y)
(udiv:m x y)
(mod:m x y)
(umod:m x y)
(smin:m x y)
(smax:m x y)
(umin:m x y)
(umax:m x y)
(not:m x)
(and:m x y)
(ior:m x y)
(xor:m x y)
(ashift:m x c)
(lshiftrt:m x c)
(ashiftrt:m x c)
(rotate:m x c)
(rotatert:m x c)
(abs:m x)
Comparison
(eq:m x y)
(ne:m x y)
(gt:m x y)
(gtu:m x y)
(lt:m x y)
(ltu:m x y)
(ge:m x y)
(geu:m x y)
(le:m x y)
(leu:m x y)
(unord:m x y)
(if_then_else cond then else)
Conversion
(sign_extend:m x)
(zero_extend:m x)
(float_extend:m x)
(truncate:m x)
(float_truncate:m x)
(float:m x)
(unsigned_float:m x)
(fix:m x)
(unsigned_fix:m x)
(fix:m x)
Other
(asm_input s)