[patch] Fix behavior of TER on unrolled loops

Wed Aug 24 12:08:00 GMT 2005

Hello,

> >>You must be thinking of different types of loop opts than i am.
> >>What high level loop optimization do you think requires knowledge of
> >>addressing modes?
> >
> >
> >strength reduction needs to know about addressing modes, but that's
> >about it.  No other loop optimization should need to care about them.
> >And I would not consider SR a "high level loop optimization".
> 
> We went through all this on IRC... no it's not a high level loop 
> optimization, but it's part of the middle end, and Dan claimed that that 
> should be completely target independent, which I thought was a silly notion.

first of all, I consider SR to be part of backend, and thus eligible to
know about target features.  Ivopts are currently positioned very late
in the tree optimizations stage, about at the point where I would like
to see the middle-end/back-end split in the future.  I do not consider
SR very appropriate for middle-end part for the following reasons:
 -- since SR by itself does not optimize the code unconditionally in any sense;
    on architectures where (and in the special cases when) multiplication and
    addition have the same costs, SR brings nothing,
 -- SR also does not make the code easier to optimize by other
    optimizers; if anything, it makes it more difficult by making the
    analysis of the values more complicated.

Concerning SR, there are two basic approaches, as far as I know:

1) To have strength reduction that is reasonably clever, knows about
things like addressing modes and number of available registers, and
directly produces a reasonable code.  This is the current approach of
GCC and seems to work quite well, while being reasonably simple.
IMO this is the only way that can work well on architecture that
has small number of registers and simultaneously rich addressing modes
(i.e., x86).

2) Have stupid strength reduction that just strength reduces everything
(modulo not creating induction variables with the same step differing
only by constant/simple invariant increments, and similar observations),
and some other pass(es) performing un-SR with knowledge of addressing
modes and registers as necessary (this may be part of register
allocator that has advantage of knowing the number of available
registers precisely, but the handling of addressing modes is still
nontrivial and probably would require a separate pass).

I personally am not in favor of this approach very much, as I find it
more senseful to make a simple pass (SR) more clever, than making the task
that is already almost impossible to solve as it is (RA) more complex.
That being said, this approach will work on RISC-type architectures that
basically do not have addressing modes to worry about.  This is true
especially in case if they have at least some 30 registers, at which
point RA becomes not an issue in most practical cases.  Given the
direction of progress in the microprocessors seen in last few years,
this may very well be the case for the main optimizations targets of
GCC in fairly short time.

Zdenek