This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Language-independent functions-as-trees representation
- From: Diego Novillo <dnovillo at redhat dot com>
- To: Jason Merrill <jason at redhat dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Fri, 23 Aug 2002 15:16:21 -0400
- Subject: Re: Language-independent functions-as-trees representation
- Organization: Red Hat Canada
- References: <wvllm86k1m1.fsf@prospero.cambridge.redhat.com> <wvl65y1px1j.fsf@prospero.cambridge.redhat.com>
On Fri, 23 Aug 2002, Jason Merrill wrote:
> Basically, I'm wondering if other people are interested enough in this
> change to 1) help with the design for the IR and 2) help with porting the
> existing tree optimizers to the new IR.
>
Yes to both. I'm actually more interested in #2 but can help
with #1 by finding rough spots when building a flowgraph, SSA and
the other transformations.
> rth has raised some questions about the advisability of using COMPOUND_EXPR
> to chain statements; the current scheme uses TREE_CHAIN of the statements
> themselves. To me, the benefit is modularity; apart from the earlier
> complaints about the STMT/EXPR distinction, using COMPOUND_EXPR makes it
> easy to replace a single complex expression with a sequence of simple ones,
> simply by plugging in a COMPOUND_EXPR in its place. The current scheme
> requires a lot more pointer management in order to splice the new STMTs in
> at both ends.
>
From an optimizer perspective, all you really want is be able to
traverse all the statements in a basic block using a single
traversal. The current scheme of having COMPOUND_STMT (or _EXPR)
start a new chain is pretty convenient for building the
flowgraph. It lets you keep the nesting information in the
flowgraph itself (basic blocks know exactly which block starts
the scope they belong to without having to compute dominance
info).
> It seems to me that double-chaining could be provided by using the
> TREE_CHAIN of the COMPOUND_EXPRs.
>
How so? I'm afraid I'm not following you here.
> loop-stmt:
> LOOP_EXPR
> LOOP_EXPR_BODY -> stmt | NULL_TREE
> | DO_LOOP_EXPR
> (to be defined later)
>
I would prefer to have 2 generic loop expressions and 1 Fortran
do-loop expression. The two generic loops would be for
do-while and while constructs. However, we could get by with
just a do-while guarded with an if() (a-la SUIF). For the
do-loop expression, I'd like:
DO_LOOP_EXPR
INDEX_DECL -> var
LB_EXPR -> val
UB_EXPR -> val
COND_CODE -> '<' | '>' | '<=' | '>='
STEP_EXPR -> val
LANDING_EXPR -> COMPOUND_EXPR
BODY_EXPR -> COMPOUND_EXPR
This is equivalent to the following:
for (INDEX_DECL = LB_EXPR;
INDEX_DECL COND_CODE UB_EXPR;
INDEX_DECL += STEP_EXPR)
{
if (INDEX_DECL == LB_EXPR)
LANDING_EXPR;
else
BODY_EXPR;
}
LANDING_EXPR is just a convenience for optimizers to have
somewhere to blindly move invariant code to. This would be
expanded to something like:
INDEX_DECL = LB_EXPR;
if (INDEX_DECL COND_CODE UB_EXPR)
LANDING_EXPR;
for (...)
We could do without it. Another important feature of
DO_LOOP_EXPR is that both UB_EXPR and STEP_EXPR should be loop
invariant (ideally, constants). Anything that the front-end
can't (or won't) convert directly can be dealt with a loop
canonicalization pass which would do some basic data flow
analysis to re-write WHILE_EXPRs.
Another thing I'd like to have in each statement is the concept
of parent statement. This is a pointer to the immediately
enclosing control statement. We currently gather this
information while building the CFG, so if it's too difficult to
gather while parsing, we could fill it out while building the
CFG.
> The Java loop has 1 (or 0) EXIT_EXPR, used to express the loop condition.
> This makes it easy to distinguish from 'break's, which are expressed
> with EXIT_BLOCK_EXPR.
>
> EXIT_EXPR is a bit backwards for this purpose, as its sense is opposite to
> that of the loop condition, so we end up calling invert_truthvalue twice in
> the process of generating and expanding it. But that's not a big deal.
>
> >From an optimization perspective, are LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR
> easier to deal with than plain gotos? I assume they're preferable to the
> current loosely bound BREAK_STMT, which has no information about what it's
> exiting. EXIT_EXPR would have the same problem if it were used to express
> 'break'.
>
What's wrong with BREAK_STMT? As long as you know its parent,
it's relatively easy to figure out where to send the flow edge.
GOTO_STMTs are just as easy to deal with. I don't have a problem
with the existing setup. I don't know about
LABELED_BLOCK_EXPR/EXIT_BLOCK_EXPR.
> I've attached my current patch below; it's still in the toy stages, but
> gives an idea of my implementation strategy.
>
Note that any changes you make to the tree representation are
very likely to break the CFG and SSA builders. We want to
experiment, but we also need to make sure that any changes we
make to the IL are reflected in the CFG and dataflow info.
I'm all for experimenting with these IL changes, but I would
first want to have something like SSA-CCP working so that any
IL changes that break SSA are immediately detected with a
bootstrap/c-torture failure.
Diego.