[Bug rtl-optimization/20211] New: autoincrement generation is poor

Fri Feb 25 22:08:00 GMT 2005

When a processor does not allow register+offset addressing for a register
class, as for the floating point registers on the SH3E / SH4, the way to
avoid excessive reloads and to expose the issue to the rtl optimizers is
to disallow this addressing mode for the machine modes for which pseudo
registers are usually allocated to the register class in question, i.e.
SFmode / DFmode in our example.
Thus, when there is a structure access to a member with such a mode and a
non-zero offset, the address cannot be expressed directly, and thus,
during rtl expansion, the sum is calculated into a pseudo register first.
cse typically places these additions together at the start of a basic
block; the idea there is that we might find some cse opportunities, and
if not, combine can do something with these sums.  However, that doesn't
work when these sums are used as addresses and the machine mode of the
access does not allow reg+offset addressing.  The auto-increment generation
is flow can't do anything useful with these sums either, since the auto-inc
generation in flow only looks for cases where a memory access already matches
an exitsing add.  Thus we end up with lots of adds and sky-high register
pressure.  On two-address machines, there i an added problem that the
adds are so arranged that (at least, not counting reloads...) a
two-instruction sequence is needed to do the additions.

What is required is an optimization pass that finds all the uses of a sum
of a base register and an offset in a basic block, and figures out where an
auto-increment addressing mode can be profitably used, and also to reduce the
register pressure and number of reg-reg copies.

A patch against 4.0 20050218 is here:

