This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Reload work in progress


Every now and then, I'm working on a patch that changes several aspects of
how reload works.  So far, most of the discussion has been on Redhat
internal mailing lists; Jeff asked me to post a message here as well so
that people know about it.  I'm not appending the patch in this mail, it's
a bit too large (600K uncompressed).  If anyone is interested, I can send it
privately - it's against a not too old CVS version of gcc.

The patch tries to address what I perceive are several flaws of reload:
 - The enum reload_type/reload_when_needed handling, which tracks reload
   lifetimes (such as RELOAD_FOR_INPUT, etc.).
   From the original reload implementation, which didn't even have this, to
   the current one, this has grown enormously.  One of the problems with it
   is that it doesn't give us exact lifetimes of reload registers, and a lot
   of effort must be invested to get any useful life information out of it at
   all.  For example, there's a 100 line kludge near the end of find_reloads
   which has to modify some combinations of reload types because the code in
   reload1.c is incorrect and doesn't handle all possible cases properly.
 - Reload inheritance is a nice concept, but the implementation is just about
   incomprehensible and unmaintainable.  Given the size of it, it also
   doesn't really work all that well - witness the post-reload attempts like
   reload_cse_regs or the noop-moves code in jump.c which fix up what reload
   left behind.  Even with these additional mini-passes, we occasionally end
   up with obviously stupid code generation.
 - Reload register allocation.  Half a dozen places all over the reload pass
   set reg_rtx because each thinks it knows better how to allocate a reload
   register than the other ones (e.g. combine_reloads).

Here's what I'm trying to do about it:
 - All reload register allocation is now done during the first pass.
   choose_reload_regs/allocate_reload_reg are gone.
   The intent is to have one place that allocates reload registers, not half
   a dozen.  Also, doing inheritance in the first pass might, if done right,
   allow us to generate better code (e.g. if it turns out we won't need some
   address reloads, we don't need to allocate a register for them).
 - We try to track lifetimes accurately.
   * Every register that occurs in some way during an insn gets its own
     structure (struct reload_reg_use) which records the lifetime.
     "Registers" includes reload registers; there's a reload_reg_use
     structure for each reload.  Conflicts are checked based on these, so we
     can check conflicts between two reloads with the same function as
     conflicts between reloads and hard regs.
     There can be three ways a register is used: either it's a reload
     register, it's used as an input, or it's used as an output.  By keeping
     track of registers explicitly used in the insn (either as hard regs or
     as allocated pseudos), we can figure out exactly which regs conflict
     with each reload.  This might in a few cases lead to better code.
   * After find_reloads, we scan the pattern of the insn and search for
     register references and replacements, building up definition/use
     relations.  There are "reload_insn" structures which describe reload
     insns (or rather time slots for emitting them).  Two time slots are
     reserved for the reloaded insn itself (earlyclobber outputs and normal
     outputs).
   * After we know the definition/use relationships, there's a small
     scheduling pass (don't think of haifa, think of a screenful of code).
     Once that's done, we know the order of reload insns, and we can compute
     the "birth" and "death" fields of the "reload_reg_use" structures
     (second half of compute_reload_order).  Once that's done, testing for
     conflicts between reloads is a simple matter of comparing birth and
     death.  This is similar in a way to what reload_reg_free_for_value_p is
     trying to do right now.
 - Reload inheritance is rewritten. It's structured as a pass within reload
   and works with the lifetime data structures that were gathered.  It's
   deliberately done so as to be not so intrusive on unrelated code - it's
   a single block of code that could in theory be deleted without affecting
   anything else.
   It also works differently.  The old code runs in parallel with the actual
   reloading; insns are processed in order, and we keep track of which hard
   regs hold interesting values.  If it turns out we need one of these values
   again, we'll reuse the hard reg.  The new code looks at all insns in an
   extended basic block, gathering information about which insns reload
   identical objects, and will then try to replace as many of these reloads
   as possible with the same register.
 - The order of reload insns which we compute is kept until the second
   pass of reload, and emit_reload_insns just reuses it.  No need for
   reload_when_needed there either.
 - find_reloads and its subroutines do not allocate reload registers anymore.
   It's expected that find_reload_regs is powerful enough now.
   This affects e.g. combine_reloads which goes away with this patch.
   The only exception still left is find_dummy_reload, which isn't quite
   redundant yet because it's used to compute costs of alternatives.  It
   would be nice to kill that one eventually, too.

The goal is to significantly reduce complexity, without sacrificing average
generated code quality.  The "average" is important; obviously two entirely
different implementations won't behave identically on all inputs.

The patch doesn't even try to keep any of the old code in these areas
(exception: the code in find_reload_regs/find_reg which was developed in the
context of this patch but already went into gcc a while ago).  I started by
completely removing the inheritance and reload_type code and re-did things
from scratch.  This means that some optimizations may have been lost.  There
is a choice whether to re-add any code that could in theory still work within
the new framework, whether to adapt the new code to perform the same
optimization, or whether to ignore the issue because the code quality gains
don't justify the added complexity.

One of the main advantages of this new inheritance code is that it looks
at an extended basic block globally.  This means that there are more choices
available for it that it can use to generate better code.  The current code
remembers the choices made for reload regs in previous insns and will reuse
them at a later insn if they are still available.  The rewritten code can
choose to change the register that was used in the first insn if that means
it will live longer and thus be available for more inheritance opportunities.

It can also detect cases where there are for example several optional (but no
mandatory) reloads for the same object.  The current inheritance code only
kicks in if there's a mandatory reload at the beginning, it then inherits the
subsequent ones; but in this kind of situation only the new code can choose
to enable all the optional reloads if there's a suitable free register.

Right now the results tend to be comparable to or slightly better than the
existing code (_without_ the help of reload_cse_regs; parts of noop_moves
are also gone for now).  Occasionally there are blunders, but I think I know
how to fix those.  All tests were done on i386; I'm sure I'll have to tweak
things a bit before other platforms show similar results.  Things look
rather worse on the sh, for example; fixing that requires adding support for
inheriting PLUS expressions to the new inheritance code.  That's not hard,
but requires some more work.

Code size is way down: 100K less code in reload*.[ch] even without removing
reload_cse_regs (which is an extra 15K if it really becomes unnecessary).
The size will probably grow again, possibly quite a bit, with some additional
enhancements I've planned, and undoubtedly testing on other machines will
reveal a couple of necessary additions, but I'm still hoping for a code size
reduction in the end.

Downsides of the patch are increased memory consumption, and most likely
somewhat higher compile times as well.  I have not done any meaningful
benchmarks yet, though.


Bernd


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]