branch probabilities dirty entry BB's aux, breaking flow

Alexandre Oliva aoliva@redhat.com
Fri Mar 22 10:54:00 GMT 2002


After the introduction of branch probabilities, ENTRY_BLOCK_PTR->aux
is non-NULL when entering calculate_global_regs_live.  The result is
that ENTRY_BLOCK_PTR is never enqueued, so its global_live_at_end is
never set.  This disabled that forced pseudos live on function entry
to global register allocation, triggering the problem in local-alloc
mentioned in teh comments right above that chunk of code.  The
following patch fixes the problem, that showed up as a mis-compilation
of main() in gcc.c-torture/execute/comp-goto-1.c on an Red
Hat-internal MIPS toolchain, with -mips16 -mabi=32.  Unfortunately,
since the bug is highly dependent on register allocation choices, and
extremely sensitive to minimal changes in the testcase, I was unable
to create a new testcase that would trigger the bug on more common
architectures.

All the gory details of the problem and the many insights I've gained
while trying to debug it are included after the patch, for your
amusement, and for the record.  I mean, for me to be able to claim the
record of the highest ratio spent time/patch size in known history
(and probably in unknown history and prehistory too :-)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: flow-clear-aux.patch
Type: text/x-patch
Size: 748 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20020322/f6718877/attachment.bin>
-------------- next part --------------


Ok, so here's the long story, that I had already written when I found
out the real problem, so I thought I'd post it anyway.

Basically, the problem is that we have conflicting decisions between
local-alloc and reload/caller-save.  A snippet of the problem, that is
sufficient to describe the problem, but not to duplicate it (since
these register allocation problems are very sensitive to minimum
changes), follows:

  insn_t insn;
  host_addr_t a_page = (host_addr_t) malloc (2 * 4096);

  a_page = (a_page + 4096 - 1) & -4096;

  insn.f1.offset = LOAD32_RR;

This is all in a single basic block, and these variables are not used
in other basic blocks.  insn_t is a union with a member f1 that is a
struct with a member offset that is a bitfield.  a_page is an SImode
integer type.

Just after register allocation, we have code like this generated for
this function:

;; (reg:DI insn) is uninitialized
;; initialization of (reg:SI a_page) omitted

[...]

(set (reg:SI temp) (const_int -4096))
(set (reg:SI a_page) (and:SI (reg:SI a_page) (reg:SI temp)))

[...]

(set (reg:SI temp2) (and:SI (reg:SI temp2) (subreg:SI (reg:DI insn) 0)))
(set (subreg:SI (reg:DI insn) 0) (reg:SI temp2))

Since both variables are only used in a single basic block, lreg
decides it can assign both of them.

Since their birth/death intervals (live ranges?) do not overlap (the
birth of reg insn is past the death of reg temp), lreg assigns both
of them to start at $a0, which is kind of reasonable.

The problem occurs when reload realizes that $a0 is a call-clobbered
register, and introduces spills and reloads of it.  First off, since
reg insn is live on entry of the basic block, its value is stored in
$a0's home stack slot before calling malloc().  Then, before the first
use of $a0/temp, save_call_clobbered_regs() loads $a0 back from the
stack, clobbering the value just assigned to reg temp.  Oops.

My first attempt at solving the problem was to get global-alloc to
mark in insn chains the death of pseudos in overlapping registers
chosen by local-alloc, such that reload would not introduce the
inappropriate restore.  It failed miserably; I had expected
chain->dead_or_set (passed to reg_becomes_live in regs_set) to be used
for this purpose, but it isn't.  Oh, well...  Here was the candidate
patch, anyway.  It might still be useful, since it actually improves
tracking of register life within global, but it's not necessary at
this point:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: global-overlap-of-locals.patch
Type: text/x-patch
Size: 1200 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20020322/f6718877/attachment-0001.bin>
-------------- next part --------------

The second attempt at disabling the inappropriate re-materialization
worked, by arranging for caller-save to keep track of stores itself.
This is implemented in the following patch, that might also be useful
at some point if local-alloc is ever changed so as to remove the
current limitation.  Besides, it took me a while to get to it, so I
thought I'd at least post it :-)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: caller-save-set-disables-restore.patch
Type: text/x-patch
Size: 1497 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20020322/f6718877/attachment-0002.bin>
-------------- next part --------------

Then it fell onto me that local-alloc was actually being misled by
flow, that says register insn is only used within a basic block, even
though it knows that insn is live on entry of the basic block.  I even
got worried that a testcase like the one below could cause a similar
problem, but even more serious, if the basic block were the body of a
loop.  Say, something like:

void foo () {
  int counter = 0;
  struct { long int i:7; long int padding:25; } x;

 label:
  some computation that involves temporary pseudos and function calls;

  x.i = !!counter * x.i + 1;

  if (counter < 10)
    goto label;
}

I admit this testcase is artificial, but I can see something like this
being generated after a number of transformations that GCC knows how
to do.  BB reordering and merging could transform a regular loop into
something like this, and the multiplication of the uninitialized x.i
by 0 in the first loop trip might as well be the result of some
if-conversion transformation that turned a conditional initialization
into such an expression.

Anyway, the end result is that x would only be used within a single
basic block, and its live range would extend from the instruction that
loads the word containing x to multiply x.i by !!counter, to the
instruction that stores x back.

Should one of the pseudos of the computation be assigned to the same
register as x, the value of x.i would be clobbered in every iteration
of the loop.  Unless there's some code that marks registers live at
the end of a BB as global...  And, indeed, there it is, in
propagate_block(), so GCC gets it right.

But I still didn't know about it (which led me to the actual solution,
in the first patch), before I made yet another attempt.  Back to the
case of the single, non-looping basic block.  Even though it is safe
to consider that register insn is available for local register
allocation, doing so may require unnecessary spilling and reloading if
it happens to be assigned to a call-clobbered register.  And we would
want it to be assigned to a call-clobbered register, since its real
live range does not cross any function calls.  The problem is that
some uninitialized bits of register cause the register to be
considered live from the beginning of the basic block, so it would be
spilled, and might be needlessly rematerialized if it turned out to be
assigned to the same register as some other partial-word variable.

Thus, the one-before-last candidate patch I wrote for this problem was
to prevent pseudos found to be live in the beginning of a basic block
(used in a BB before they're set) from being considered for lreg
allocation.  The patch below actually fixed the problem, but for the
wrong reason.  Just a few minutes after having written it was I struck
by the realization that life analysis is done backwards (doh!), which
means a correct patch would have to be slightly more involved.  The
idea could still be applied by tweaking mark_used_reg() so as to set
REG_BASIC_BLOCK to say -blocknum-3 (-3 because -1 and -2 are already
taken), instead of blocknum, and adjusting mark_set_1() to recognize
it and set it to blocknum, and then adding a pass over
REG_BASIC_BLOCKs at the end of reg life analysis that turned any
negative values into REG_BASIC_GLOBAL (such that, if a pseudo is first
used in a single basic block, we'd refrain from using local-alloc to
choose a register for it).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: flow-use-before-set-is-global.patch
Type: text/x-patch
Size: 1018 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20020322/f6718877/attachment-0003.bin>
-------------- next part --------------

While looking into this alternate method, I found the comment
explaining the problem in local-alloc, and then it was relatively easy
to figure out why we were failing to set
ENTRY_BLOCK_PTR->global_live_at_end properly.

The end :-)

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist                Professional serial bug killer


More information about the Gcc-patches mailing list