This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: My opinions on tree-level and RTL-level optimization
- From: Paolo Bonzini <paolo dot bonzini at lu dot unisi dot ch>
- To: GCC Development <gcc at gcc dot gnu dot org>, Roger Sayle <roger at eyesopen dot com>, kenner at vlsi1 dot ultra dot nyu dot edu
- Date: Mon, 18 Apr 2005 16:37:15 +0200
- Subject: Re: My opinions on tree-level and RTL-level optimization
- References: <10504181251.AA12934@vlsi1.ultra.nyu.edu>
I think Roger simply mis-spoke because in his original message, he
said what you said: the important issue is having the alias
information available in RTL. Much (but not all: eg., SUBREG info) of
that information is best imported down from the tree level.
Well, paradoxical subregs are just a mess: optimizations on paradoxical
subregs are better served at the tree level, because it is just
obfuscation of e.g. QImode arithmetic.
Indeed, my patch removed an optimization on paradoxical subregs, and
kept an optimization on non-paradoxical subregs.
Take this code:
long long a, b, c, d;
int x;
...
c = a * b;
d = (int) x * (a * b);
In my view, tree-level optimization will catch (a * b) as a redundant
expression. RTL-level optimization will catch that the high-part of
"(int) x" is zero.
Roger proposed lowering 64-bit arithmetic to 32-bit in tree-ssa! How
would you do it? Take
long long a, b, c;
c = a + b;
Would it be
c = ((int)a + (int)b)
+ ((int) (a >> 32) + (int) (b >> 32)
+ ((unsigned int) a < (unsigned int) b)) << 32;
Or will you introduce new tree codes and uglifying tree-ssa? Seriously...
This is a very inaccurate characterization of CSE. Yes, it does those
things, but eliminating common subexpressions is indeed the major task
it performs.
It was. Right now, the only thing that fold_rtx tries to simplify is
(mult:SI (reg:SI 58) 8)
to
(ashiftrt:SI (reg:SI 58) 3)
Only to find out it is not a valid memory_operand... I have a patch to
completely disable calling fold_rtx recursively, only equiv_constant.
That was meant to be part 3/n of the cleanup fold_rtx series. I was
prepared to take responsibility for every pessimization resulting from
these cleanups, and I expected to be sure I'd find a better way to do
the same thing.
A 7000-lines constant propagator...
I think there's a serious conceptual issue in making the tree level too
machine-dependent. The *whole point* of doing tree-level optimizations
is to do machine-*independent* optimizations. Trees are machine-independent
and RTL is machine-dependent. If we go too far away from that, I think
we miss the point.
No, the whole point of doing tree-level optimizations is to be aware of
high-level concepts before they are lowered. No need to worry about
support for QImode-size arithmetic. No need to worry if 64-bit
multiplication had to be lowered.
Besides, the RTL optimizers are not exactly a part of GCC to be proud
of if "ugliness" is a measure.
Really?
The biggest and less readable files right now are combine.c, reload.c,
reload1.c. cse.c is big (though not extreme) but unreadable.
OTOH, stuff like simplify-rtx.c or especially fold-const.c is big but
readable.
Of course GCC will always need a low-level IR. But, combine is
instruction selection in the worst possible way;
It served GCC well for decades, so I hardly think that's a fair statement.
Never heard about dynamic programming?
reload is register allocation in the worst possible way,
Reload is not supposed to do register allocation. To the extent that
it does, I agree with you. But what this has to do with the issue of
tree vs. RTL optimization is something I don't follow. Surely you
aren't suggesting doing register allocation at the tree level?
No, he's suggesting cleaning up stuff, so that it is easier to stop
doing things in the worst possible way. He's suggesting to be realistic
once code has run completely out of control.
Luckily some GWP people do care about cleaning up. Richard Henderson
did a lot of work on cleaning up RTL things left from olden times (think
eh, nested functions, addressof, save_expr,...), Zack did some work on
this ground in the past as well, Bernd is maybe the only guy who could
pursue something such as reload-brench...
I hate to make "clubs" out of a community, but it looks like only some
people care of the state of the code... Steven has done most of the
work for removing the define_function_unit processor descriptions. I
removed ~5000 lines of code after tree-ssa went in (including awful
stuff such as protect_from_queue, which made sense maybe in 1990, and
half of stmt.c). Kazu is also in the CSE-cleanup game. Maybe, link in
my case, it's only because I have limited time to spend on GCC and think
that cleaning up is a productive way to use this time. But anyway I
think it is worth the effort.
Paolo