This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: reload inheritance bug & patch


The attached testcase fails on 3.4 with:

    mips64el-elf-gcc -O2 inherit.c -Tidt.ld

I haven't yet found a similar testcase for mainline.

The root cause is a bug in reload inheritance.  We have rtl like this:

   A:  (set (reg:QI R2)
            (subreg:QI (reg:HI R1)))

   B:  (set (reg:DI R3)
            (ashift:DI (subreg:DI (reg:QI R2) 0)
                       (const_int 58)))

   ... later insns which use R1 ...

in which R1 is allocated to a general register (G1) and R2 is spilled
to the stack.  R2's stack slot naturally satisfies the destination
constraints of insn A so there are no reloads of R2 for insn B to inherit.
choose_reload_regs() therefore tries to inherit using the find_equiv_reg()
route instead.

find_equiv_reg() finds insn A and rightly says that (reg:QI R2) is
equivalent to (subreg:QI (reg:HI G1) 0).  As expected, the right hand
side gets simplified to (reg:QI G1):

		  else if (GET_CODE (equiv) == SUBREG)
		    {
		      /* This must be a SUBREG of a hard register.
			 Make a new REG since this might be used in an
			 address and not all machines support SUBREGs
			 there.  */
		      regno = subreg_regno (equiv);
		      equiv = gen_rtx_REG (rld[r].mode, regno);
		    }

and then there are various checks to see whether (reg:QI G1) can be
inherited here.  All these checks say it can, so an inheritance is
duly recorded:

	      /* If we found an equivalent reg, say no code need be generated
		 to load it, and use it as our reload reg.  */
	      if (equiv != 0
		  && (regno != HARD_FRAME_POINTER_REGNUM
		      || !frame_pointer_needed))
		{
                  ...
		  rld[r].reg_rtx = equiv;
		  reload_inherited[r] = 1;
                  ...
                }

However, the inheritance gets rejected by the next loop:

	  if (! free_for_value_p (true_regnum (check_reg), rld[r].mode,
				  rld[r].opnum, rld[r].when_needed, rld[r].in,
				  (reload_inherited[r]
				   ? rld[r].out : const0_rtx),
				  r, 1))
	    {
	      if (pass)
		continue;
	      reload_inherited[r] = 0;
	      reload_override_in[r] = 0;
	    }

because reload_reg_free_for_value_p() checks:

  if (TEST_HARD_REG_BIT (reload_reg_unavailable, regno))
    return 0;

and because G1 is included in reload_reg_unavailable (R1/G1 being live
throughout insn B).

Now that the code above has cancelled the inheritance, we're left with
an _uninherited_ reload from (reg:QI R2) to (reg:QI G1).  Although such
a reload correctly preserves the low byte of (reg:HI R1/G1), it clobbers
the high byte, causing trouble for any later uses of the register.

In this particular case, I suspect reload_reg_free_for_value_p() is
being too conservative.  There doesn't seem to be any reason why we
can't inherit a value that is invariant throughout the instruction.

[ FWIW, the original form of the reload_reg_unavailable check comes from:

      http://gcc.gnu.org/ml/gcc-patches/1999-11n/msg00333.html

  in which the cited reason was to treat asm clobbers as earlyclobbers.
  The current form of the check is due to:

      http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00386.html ]

However, the real problem seems to be that the checks for find_equiv_reg()
inheritance are too loose.  They only test free_for_value_p() when inheriting
a spill register:

    /* If we found a spill reg, reject it unless it is free
       and of the desired class.  */
    if (equiv != 0)
      {
        int regs_used = 0;
        int bad_for_class = 0;
        int max_regno = regno + rld[r].nregs;

        for (i = regno; i < max_regno; i++)
          {
            regs_used |= TEST_HARD_REG_BIT (reload_reg_used_at_all,
                                            i);
            bad_for_class |= ! TEST_HARD_REG_BIT (reg_class_contents[(int) rld[r].class],
                                                 i);
          }

        if ((regs_used
             && ! free_for_value_p (regno, rld[r].mode,
                                    rld[r].opnum, rld[r].when_needed,
                                    rld[r].in, rld[r].out, r, 1))
            || bad_for_class)
          equiv = 0;
      }

and on the face of it, just checking spill registers could lead to the
same sort of problem that brought about the patch linked above.

On the other hand, calling free_for_value_p() unconditionally might be
going too far the other way: it would check lower-priority reloads whose
reg_rtxes haven't been finalised yet.  (They might change later due to
inheritance.)

Perhaps an obvious fix would be to make the "regs_used/bad_for_class |="
loop reject the inheritance if a register is in reload_reg_unavailable.
Unfortunately, this seems to kill most of the advantage of the code.
I tried three versions of reload1.c:

  (a) Current mainline.
  (b) Current mainline with the fix just suggested.
  (c) Current mainline with find_equiv_reg() inheritance disabled altogether.

Compiling CSiBE on mips64-elf at -O2 showed the following differences
between (a) and (b):

    parser                                  5976     5984 :  100.13%
    parse                                  10384    10392 :  100.08%
    tblcmp                                  5168     5184 :  100.31%
    src/lpgparse                           56824    56840 :  100.03%
    src/prntstat                            4216     4224 :  100.19%
    src/remsp                               9344     9352 :  100.09%
    src/resolve                            15560    15576 :  100.10%
    jcsample                                3248     3256 :  100.25%
    jfdctint                                1424     1440 :  101.12%
    jdphuff                                 4360     4368 :  100.18%
    jdcoefct                                5152     5168 :  100.31%
    jidctint                                2416     2408 :   99.67%
    jquant2                                 6136     6152 :  100.26%
    rdtarga                                 3096     3104 :  100.26%
    transupp                                6544     6552 :  100.12%
    mspack/cabd                             9168     9176 :  100.09%
    mspack/lzxd                             8968     8976 :  100.09%
    test/cabextract_md5                    33420    33428 :  100.02%
    test/md5                                4496     4504 :  100.18%
    fs/jbd/recovery                         2824     2832 :  100.28%
    fs/nfs/read                             3200     3208 :  100.25%
    fs/ext3/inode                          17352    17344 :   99.95%
    fs/select                               3320     3328 :  100.24%
    mm/memory                              10568    10576 :  100.08%
    mm/filemap                             20048    20056 :  100.04%
    mm/vmscan                               3352     3360 :  100.24%
    mm/page_alloc                           6392     6416 :  100.38%
    mm/shmem                                9352     9360 :  100.09%
    ipc/sem                                 6328     6336 :  100.13%
    net/ipv4/route                         17540    17556 :  100.09%
    net/ipv4/ip_output                      7588     7596 :  100.11%
    net/sunrpc/xprt                         9428     9452 :  100.25%
    drivers/block/ll_rw_blk                 9760     9768 :  100.08%
    kernel/ptrace                           1736     1744 :  100.46%
    src/core/tcp_output                     3800     3808 :  100.21%
    src/netif/tcpdump                       1800     1808 :  100.44%
    dump                                    2004     2012 :  100.40%
    src/echo/renderEcho                     5744     5752 :  100.14%
    src/ell/test/tq                         4280     4288 :  100.19%
    src/gage/sclanswer                      5408     5416 :  100.15%
    src/hest/usage                          4520     4536 :  100.35%
    src/mite/txf                            9200     9208 :  100.09%
    src/moss/xform                          3456     3480 :  100.69%
    src/nrrd/filt                           7504     7512 :  100.11%
    src/nrrd/reorder                       13480    13488 :  100.06%
    src/nrrd/cc                            15216    15240 :  100.16%
    src/ten/glyph                           5928     5936 :  100.13%
    src/ten/tendSatin                       5448     5456 :  100.15%
    src/ten/tendGlyph                       8192     8200 :  100.10%
    ----------------------------------------------------------------
    Total                                3939049  3939537 :  100.01%

and these differences between (b) and (c):

    blocksort                               8608     8616 :  100.09%
    src/resolve                            15576    15600 :  100.15%
    jquant2                                 6152     6160 :  100.13%
    pnm2png                                 4272     4264 :   99.81%
    drivers/char/n_tty                      9472     9464 :   99.92%
    src/nrrd/measure                       13580    13588 :  100.06%
    src/nrrd/resampleNrrd                   9048     9040 :   99.91%
    src/nrrd/cc                            15240    15232 :   99.95%
    src/ten/chan                            9904     9912 :  100.08%
    ----------------------------------------------------------------
    Total                                3939537  3939561 :  100.00%

A spot check shows that most of (c)->(b) benefit comes from two things:

  (1) Secondary reloads.  Suppose we have a secondary reload:

          hardreg1 -> hardreg2 -> pseudoN

      reg_last_reload_reg[pseudoN] will be hardreg1, so that's what
      "normal" reload inheritance will try to use.  OTOH, find_equiv_reg()
      will find the move from hardreg2 to pseudoN and therefore try to
      inherit hardreg2 instead.  In many cases, hardreg2 is likely to
      belong to a more general class than hardreg1, in which case
      there's often more chance of it being suitable.

  (2) Situations like:

          (set (reg hardreg) (thing))
          ...
          (set (reg pseudoreg) (thing))

      where (reg pseudoreg) is spilled to the stack.  Although no
      inheritance takes place for the second insn, the find_equiv_reg()
      code will choose "hardreg" for the reload of "pseudoreg".  This gives
      more chance for clean-up later.

But as the (b)->(a) results show, most of the benefit of the current
code comes from cases where we can't inherit the reload.  In fact the
typical situation seems to very much like the one that's causing trouble.
We have sequences like:

     (set (reg:SI R2) (reg:SI R1))
     ...
     (set (reg:SI R3) (... (reg:SI R2) ...))

where, as above, R1 is allocated a hard register G1 and R2 is spilled
to the stack.  We again choose to use G1 for the input reload in the
second instruction, at first thinking that we can inherit the value
from the first insn.  This doesn't happen, so we end up with a reload:

     (set (reg:SI R2) (reg:SI G1))
     ...
     (set (reg:SI G1) (reg:SI R2))   <----
     (set (reg:SI R3) (... (reg:SI G1) ...))

but the post-reload optimisers can remove it.

When no subregs are involved, this procedure should be safe, since we're
simply reloading the value that the register already had.  It was still
surprising to me that all this find_equiv_reg() stuff doesn't seem to be
benefiting from true inheritance that much.  It certainly doesn't seem
to be explained that way in the comments.

Anyway, the patch below fixes the testcase and produces no change in CSiBE
score.  I've also bootstrapped & regression tested it on i686-pc-linux-gnu
(mainline).  I'm not 100% certain that this is the right approach though.

By the way, I think both normal and paradoxical subregs need this treatment.
Paradoxical subregs need it because of things like TRULY_NOOP_TRUNCATION.

Richard

Attachment: reg-equiv-inherit.diff
Description: Text document

#define MULTI1(X) \
	X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \
	X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19), \
        X(20), X(21), X(22), X(23), X(24), X(25), X(26), X(27), X(28), X(29)

#define COPY1(INDEX) c1[INDEX] = g[INDEX].s
#define COPY2(INDEX) c2[INDEX] = g[INDEX].bits.x1 + g[0].s
#define COPY3(INDEX) c3[INDEX] = g[INDEX].bits.x2 + g[0].s
#define COPY4(INDEX) c4[INDEX] = g[INDEX].s

union {
  struct { short x1:3, x2:3; } bits;
  short s;
} g[30];

volatile short c1[30], c2[30], c3[30], c4[30];

int main ()
{
  int i;
  for (i = 0; i < 30; i++)
    g[i].s = 0x7fff;
  MULTI1 (COPY1);
  MULTI1 (COPY2);
  MULTI1 (COPY3);
  MULTI1 (COPY4);
  for (i = 0; i < 30; i++)
    if (c4[i] != c1[i])
      abort ();
  exit (0);
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]