This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: My opinions on tree-level and RTL-level optimization


I think Roger simply mis-spoke because in his original message, he
said what you said: the important issue is having the alias
information available in RTL.  Much (but not all: eg., SUBREG info) of
that information is best imported down from the tree level.

Well, paradoxical subregs are just a mess: optimizations on paradoxical subregs are better served at the tree level, because it is just obfuscation of e.g. QImode arithmetic.


Indeed, my patch removed an optimization on paradoxical subregs, and kept an optimization on non-paradoxical subregs.

Take this code:

    long long a, b, c, d;
    int x;
    ...
    c = a * b;
    d = (int) x * (a * b);

In my view, tree-level optimization will catch (a * b) as a redundant expression. RTL-level optimization will catch that the high-part of "(int) x" is zero.

Roger proposed lowering 64-bit arithmetic to 32-bit in tree-ssa! How would you do it? Take

    long long a, b, c;
    c = a + b;

Would it be

    c = ((int)a + (int)b)
        + ((int) (a >> 32) + (int) (b >> 32)
           + ((unsigned int) a < (unsigned int) b)) << 32;

Or will you introduce new tree codes and uglifying tree-ssa? Seriously...

This is a very inaccurate characterization of CSE.  Yes, it does those
things, but eliminating common subexpressions is indeed the major task
it performs.

It was. Right now, the only thing that fold_rtx tries to simplify is


(mult:SI (reg:SI 58) 8)

to

(ashiftrt:SI (reg:SI 58) 3)

Only to find out it is not a valid memory_operand... I have a patch to completely disable calling fold_rtx recursively, only equiv_constant. That was meant to be part 3/n of the cleanup fold_rtx series. I was prepared to take responsibility for every pessimization resulting from these cleanups, and I expected to be sure I'd find a better way to do the same thing.

A 7000-lines constant propagator...

I think there's a serious conceptual issue in making the tree level too
machine-dependent.  The *whole point* of doing tree-level optimizations
is to do machine-*independent* optimizations.  Trees are machine-independent
and RTL is machine-dependent.  If we go too far away from that, I think
we miss the point.

No, the whole point of doing tree-level optimizations is to be aware of high-level concepts before they are lowered. No need to worry about support for QImode-size arithmetic. No need to worry if 64-bit multiplication had to be lowered.


    Besides, the RTL optimizers are not exactly a part of GCC to be proud
    of if "ugliness" is a measure.

Really?

The biggest and less readable files right now are combine.c, reload.c, reload1.c. cse.c is big (though not extreme) but unreadable.


OTOH, stuff like simplify-rtx.c or especially fold-const.c is big but readable.

Of course GCC will always need a low-level IR. But, combine is
instruction selection in the worst possible way;


It served GCC well for decades, so I hardly think that's a fair statement.

Never heard about dynamic programming?


reload is register allocation in the worst possible way,

Reload is not supposed to do register allocation.  To the extent that
it does, I agree with you.  But what this has to do with the issue of
tree vs. RTL optimization is something I don't follow.  Surely you
aren't suggesting doing register allocation at the tree level?

No, he's suggesting cleaning up stuff, so that it is easier to stop doing things in the worst possible way. He's suggesting to be realistic once code has run completely out of control.


Luckily some GWP people do care about cleaning up. Richard Henderson did a lot of work on cleaning up RTL things left from olden times (think eh, nested functions, addressof, save_expr,...), Zack did some work on this ground in the past as well, Bernd is maybe the only guy who could pursue something such as reload-brench...

I hate to make "clubs" out of a community, but it looks like only some people care of the state of the code... Steven has done most of the work for removing the define_function_unit processor descriptions. I removed ~5000 lines of code after tree-ssa went in (including awful stuff such as protect_from_queue, which made sense maybe in 1990, and half of stmt.c). Kazu is also in the CSE-cleanup game. Maybe, link in my case, it's only because I have limited time to spend on GCC and think that cleaning up is a productive way to use this time. But anyway I think it is worth the effort.

Paolo


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]