This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: My opinions on tree-level and RTL-level optimization


On Apr 18, 2005 02:51 PM, Richard Kenner <kenner@vlsi1.ultra.nyu.edu> wrote:

> >     Unfortunately you appear to have little clue what you are really
> >     talking about.  So let me provide you with some loud feedback as well.
> 
> Please try to keep this discussion on a civil level!

I am (for a change, maybe) not the one who started making the
discussion uncivil.  And what I wrote is not untrue either. Roger
didn't exactly display knowledge of what he was talking about, see
the many false statements he made about the RTL optimizers.


> >     You seem to have a misguided view on what CSE does.  What we call CSE
> >     is not a common subexpression elimination pass.  It just happens to
> >     catch some CSEs as well, but the most important thing it does right
> >     now is propagation things forward to select better addressing modes
> >     (essentially a kind of instruction selection), and limited
> >     folding/simplifying to clean up the sometimes terrible things 'expand'
> >     can do.
> 
> This is a very inaccurate characterization of CSE.  Yes, it does those
> things, but eliminating common subexpressions is indeed the major task
> it performs.

Only on the first pass.  I have looked at how many actual common
subexpressions it eliminates in the two passes following GCSE, and
it is really almost nothing.  CSE2 only catches a significant number
if flag_loop_optimize2 and/or flag_tracer are set. CSE1 does catch
some common subexpressions, most of those appear to come from
expanding a tree expression to multiple RTL expressions (e.g. two
tree MULT_EXPRs for which expand_* create a cse).

It is undisputed that CSE _can_ remove common subexpressions.  It
really just does not appear to do that very often in practice.

It is unfortunately hard to tell exactly what CSE does, because it
is doing so many things and it does not report the replacements it
makes.  When I looked at this, I simply disabled various parts of
CSE to see what the effect would be on the resulting code.  I also
disabled CSE1 completely and replaced it with a cselib based pass,
which didn't catch many cses and missed many of the other things
CSE does (interestingly, the effect was worst for -fPIC, but I have
not yet figured out why).


> >     (Kenner mentioned CSE around loops, but that is already gone.)
> 
> Sorry. But that is the cheaper of the two kludges (and I'm allowed to use
> that word since I wrote them).

:-)  I didn't know you wrote that.  It certainly wasn't a really
problematic piece of code, I think.  When I removed it, the plan was
to make CSE work on extended basic blocks.  But it turned out that
CSE around basic blocks (-fcse-skip-blocks) was still a very useful
thing to do (and it still was, when I looked at it again a couple of
weeks ago).


> >     I agree that some of the details of the machine-dependent parts of the
> >     tree optimizers are not the most elegant.  
> 
> I think there's a serious conceptual issue in making the tree level too
> machine-dependent.

Agreed with "too machine-dependent".  But what is that, exactly?  No
machine dependence at all?  Some, and if so, what?  I'm curious what
you think about this, I have no idea yet, really.  I tried a simple
lowering pass last year, but it didn't work out very well.  I didn't
try very hard either, though.

>  The *whole point* of doing tree-level optimizations
> is to do machine-*independent* optimizations.  Trees are machine-independent
> and RTL is machine-dependent.  If we go too far away from that, I think
> we miss the point.

Some machine-dependence would not be bad, I think.  Sure, you would
never want to explicitly expose all the details about the target
details.  But a machine specific lowering pass, for example, would
probably be a good thing.  Just to expose more expressions to the
tree optimizers, so they can do more work and the RTL optimizers do
not have to do that much work. Many other compilers do this, also.
I know that has never been a convincing argument on this list, but
it is still a sign that it is not all that insane to do, at least.


> >     Besides, the RTL optimizers are not exactly a part of GCC to be proud
> >     of if "ugliness" is a measure.
> 
> Really?

Well, yes.  I mean no offense to the people who wrote RTL optimizers
in the past, when many pieces of infrastructure we take for granted
now were simply not available.  But many passes could certainly use
a good cleanup or rewrite.  But, much to the credit of their authors,
many of the existing RTL passes just _work_ quite well, which is
why it is so difficult to rewrite them :-/

If you look at e.g. how regmove handles (or really does not handle)
basic block boundaries, or at how CSE does its path following (which,
no doubt, made sense when it was written that way), or at the kludges
in sched-ebb.c (didn't write it, still use the word ;-) to tear down
and later on fix up basic block boundaries, then yes, that is ugly.

Similarly, if you see how heavily some ports rely on reload to fix up
instructions, and how far away from actual machine instructions RTL
can sometimes be before reload, then, again, yes that is pretty ugly,
too.  Not to mention how _hard_ it is to change an RTL optimizer and
not introduce a regression somewhere, because everything depends so
strongly on each other, instead of being modular.


>     Of course GCC will always need a low-level IR.  But, combine is
>     instruction selection in the worst possible way; 
> 
> It served GCC well for decades, so I hardly think that's a fair statement.

It serves GCC well still today.  That does not mean it is the right
way to do it.

Combine finds the insns to combine in a rather random pick-and-try
fashion, with at most two or three instructions.  And it intermixes
the actual instruction (or rather, insn) selection with a bunch of
optimizations, which IMHO should be split out into a separate pass.

A more common instruction selection technique in modern compilers
is to use a tree rewriting approach.  For GCC, such a pass would
rewrite (probably lowered) trees to RTL using a machine generated
'tree' parser.  Google for BURS and iburg for examples.

Unfortunately it is not even possible to try this approach for GCC
because the actual asm instructions (the 'alternative' in GCC terms)
are not selected until reload.


>     reload is register allocation in the worst possible way, 
> 
> Reload is not supposed to do register allocation.  To the extent that
> it does, I agree with you.  But what this has to do with the issue of
> tree vs. RTL optimization is something I don't follow.

It has nothing to do with tree vs. RTL at all.  It was part of my
reply to a quote from Roger's mail that you cut out above.  That
quote suggests he thinks reload and register allocation is the same
thing:
> > > It's unlikely that RTL, or an instruction-based, representation of
> > > a program will ever go away.  And the RTL passes corresponding to
> > > instruction selection (combine), register allocation (reload) and
> > > scheduling will almost certainly be RTL-level tasks/passes.

> Surely you
> aren't suggesting doing register allocation at the tree level?

Oh, definitely not, no.  Heh.  Now _that_ would be really ugly, if
it is possible at all.  Linus suggested once in this list that GCC
should do that... ;-)

Gr.
Steven



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]