This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa] Temporary Expression Replacement in SSA->normal.


On Fri, 12 Dec 2003, Zack Weinberg wrote:
> >> Synthesis of complex machine instructions, where available, is
> >> combine's job.
> >
> > Unfortunately not.  There are a number of program optimizations that
> > only get performed during RTL expansion, and the larger the tree that
> > is passed to expand_expr the better the code that GCC generates.
> ...
>
> You give a lot of examples, and your point is well taken, but I am not
> convinced that tree->RTL conversion is the right place to perform any
> of these transformations.  For most of them, it seems to me that the
> transformation ought to have happened much earlier - during SSA-form
> constant folding and propagation.

I don't disagree that many of the extremely important optimizations
performed during RTL expansion could be performed earlier during
tree-ssa.  The pragmatic problem, addressed by Andrew's patch, is that
the current situation is that these optimizations are neither performed
by tree-ssa, nor by RTL expansion of lowered GIMPLE, which results
in a significant performance penalty.


Indeed, the reason that tree-ssa generates worse code than mainline,
even with all of the existing RTL optimizers still enabled, is not
because SSA itself serious pessimizes the code "per se", but from a
mismatch between the kinds of trees it expands, and those that the
RTL optimizers are used to optimizing.



> I'm not sure what you mean by 'fold() canonicalizes many expressions
> assuming that the results will be visible...' so I can't comment on
> that.

Sometimes, the best tree representation of performing transformations
and optimizations is not the same as the most efficient to execute.
For example, we can combine and factor multiplications and array
references easier if they are represented as multiplications and
array references, but they are executed more efficiently, as shifts
and additions and pointer arithmetic.

To solve this problem, GCC's middle-end performs some tree optimizations
in two passes.  The first "fold" attempts to raise lower level constructs
into higher level ones.  x + x => 2*x, x * x => pow(x,2), x << 3 => 8*x.
The second "expand_expr", then lowers these representations back down
again if they weren't actually combined.  2*x => x+x, pow(x,2) => x*x,
8*x => x << 3, etc...   Clearly you can't go both ways in a single pass
without an infinite loop.

However, transforming "x + x" into 2*x during fold, makes the implicit
assumption that the tree "2*x" will be seen by expand_expr, so that it
can be converted back to "x+x" without pessimizing the code.  If
expand_expr, only sees "(y = 2, x*y)" then we'd produce worse code.


> > Even with improvements to RTL optimizers, "combine" only looks at a
> > maximum of three instructions at a time.  This is why my testcase
> > for gcc.c-torture/compile/20020720-1.c fails on so many platforms,
> > but is performed trivially by tree-ssa, if all the necessary bits
> > can be passed to the constant folder at the same time.  Another
> > example is "(((A + B) + C) + D) - A" where the RTL optimizers just
> > can't help.
>
> This should be trivial for SSA CCP, no?  Which is again an argument
> for doing these transformations while we still have SSA form
> available.

I'll admit that I'm not sure.  If SSA CCP can optimize

T.1 = A + B
T.2 = T.1 + C
T.3 = T.2 + D
T.4 = T.3 - A

then I'd be impressed.  It's not impossible, just that I don't think
that deep reassociation is currently implemented.  I may be wrong.


> I do think an enhanced combine that can look at complete dependency
> graphs (somehow) is worthwhile.

There's always the combinatorial explosion worries with combine, and
even going to four instructions instead of three is likely to be
expensive.  I do have a patch waiting for 3.5 that makes use of
REG_EQUAL notes in combine, that will give us slightly more lookahead.



I'm not wedded to reconstituting GENERIC from GIMPLE before RTL
expansion, but RTL expansion is perhaps one of the most important,
but often overlooked, passes in GCC.  You can write a compiler
without constant folding, CSE, GCSE, loop, combine, jump, scheduling,
SSA etc...  Making things needlessly difficult for expand_expr should
be considered as heinous as complicating things for the register
allocator.

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]