This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Patch to fix unintentional(?) pessimisation in dbr_schedule

This patch fixes what seems to be an unintentional pessimisation in
dbr_schedule.  The patch itself is a one-liner, but unfortunately it
comes with a long explanation.

The problem is in resource.c:mark_target_live_regs(), which tries to
work out the set of registers that are live before a particular insn X.
This is made somewhat complicated by the fact that dbr_schedule doesn't
keep all basic block information up-to-date.  It's therefore not safe
(in general) to rely on the global_live_at_start set from X's basic block.

However, according to resource.c:find_basic_block(), if a block has
no fall-through predecessors, its global_live_at_start information is
supposed to be trustworthy.  mark_target_live_regs() therefore scans
back to the previous BARRIER, takes the global_live_at_start set from
the following block, then scans forward from the head of that block to
see what registers are live at X.  (Small wonder that reorg shows
up so high in some profiles. ;)

m_t_l_r() can get a conservative estimate of the live register set by
looking at the insns between the barrier and X.  Every register that
is set by one of these insns is added to the live set.  Registers can
sometimes be removed from the set on the basis of REG_DEAD notes,
but there are complications (see the code for details).  That's why
the estimate is only conservative.

In order to prune the set further, m_t_l_r() uses find_dead_or_set_registers()
to scan the instructions after X, seeing if it can prove that certain registers
are dead.  Sometimes this is effective, but sometimes the control flow is
too complex for it to gain much headway.  So the final set is still going
to be on the conservative side.

Anyway, the upshot is, scanning back from the previous barrier gives
a conservative estimate of the live register set.  m_t_l_r() therefore
has a second strategy: if X falls through to an unconditional jump,
the function will try to work out the registers that are live at the
target of the jump, then work backwards to get another estimate of
the registers live before X.  Since both estimates are supersets
of the true set, their intersection should be too.

The code that implements this strategy is:

  /* If we hit an unconditional branch, we have another way of finding out
     what is live: we can see what is live at the branch target and include
     anything used but not set before the branch.  We add the live
     resources found using the test below to those found until now.  */

  if (jump_insn)
      struct resources new_resources;

-->   IOR_HARD_REG_SET (res->regs, new_resources.regs);

Note the indicated line.  It once used AND_HARD_REG_SET, thus taking the
intersection of the two estimates, as described above.  However, it was
changed to IOR_HARD_REG_SET by:

So this code, which as far as I can tell was supposed to be an optimisation,
now only has the effect of _expanding_ the original estimate.

The patch below basically reverts the one linked above.  Diffing -O2
c-torture output for sparc-sun-solaris2.8 shows that it triggers several
times.  The full diff is attached below, but a small example is

    typedef struct {
      unsigned char a;
    } A;

    unsigned int foo (A *x)
      unsigned char b[2] = { 0, 0 };
      unsigned char c = 0;

      c = (x->a) ? b[1] : b[0];

      return (unsigned int) c;

Before dbr_schedule, the end of the function looks like this:

    16: (set (pc)
	     (if_then_else (eq (reg:CC 100 %icc)
			       (const_int 0 [0x0]))
			   (label_ref 20)

    17: (set (reg:SI 24 %i0)
	     (zero_extend:SI (mem/s:QI (plus:SI (reg/f:SI 14 %sp)
					        (const_int 97 [0x61])))))

    57: (set (pc) (label_ref 22))

    58: barrier

    20: code_label

    21: (set (reg:SI 24 %i0)
	     (zero_extend:SI (mem/s:QI (plus:SI (reg/f:SI 14 %sp)
					        (const_int 96 [0x60])))))

    22: code_label

    34: (set (reg/i:SI 24 %i0)
	     (zero_extend:SI (reg/v:QI 24 %i0)))

We first fill the delay slot of insn 57 with insn 17:

    61: (sequence
	   (set (pc) (label_ref 22))
	   (set (reg:SI 24 %i0)
		(zero_extend:SI (mem/s:QI (plus:SI (reg/f:SI 14 %sp)
						   (const_int 97 [0x61]))))))

and then, when trying to fill the delay slot of insn 16, we want to know
what's live at insn 61.  Using the first scan-from-barrier technique, we
correctly deduce that register 24 is _not_ live.

We then get to the code quoted above.  Since insn 61 is an unconditional
jump, we try the alternative technique: that is, we calculate the registers
live at label 22, then add all registers that are used before they are set
in insn 61.

Now because register 24 is live at label 22, this alternative technique
will conservatively assume that it is live at insn 61 as well.  Since we
take the union of this estimate and the previous one, we assume that
register 24 is live at insn 61, and that putting insn 34 in the delay
slot of insn 16 requires insn 16 to be an annulled branch.

I wondered whether it would be better to simply get rid of the code
that calculates the second estimate.  However, there are cases in the
c-torture testsuite where the patch below is better than removing the
code outright.

Patch tested by bootstrapping and regression testing on mips64-linux-gnu,
mips64el-linux-gnu and sparc-sun-solaris2.8.  There were no regressions.
I couldn't see any sign of the problem that prompted the original change.

OK to install?  If so, I realise I'm on the hook to fix any problems
that the patch might expose.


	* resource.c (mark_target_live_regs): Take the intersection of the
	two estimates, not their union.

Attachment: mark-target-live-regs.diff
Description: Text document

Attachment: sparc-c-torture.diff
Description: Text document

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]