[Bug middle-end/70236] New: Register allocation and loop unrolling lead to waste of registers

Tue Mar 15 10:40:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70236

            Bug ID: 70236
           Summary: Register allocation and loop unrolling lead to waste
                    of registers
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vogt at linux dot vnet.ibm.com
                CC: vmakarov at gcc dot gnu.org
  Target Milestone: ---

Created attachment 37966
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37966&action=edit
ira dump

A new s390x pattern for shift-and-xor does not yield a satisfying result with
this code when compiled with "-O3 -funroll-loops":

-- snip --
unsigned long hash(unsigned long l)
{
  unsigned long v = 0;
  unsigned long i;

  for (i = 0; i < 8; i++)
    {
      v <<= 1;
      v ^= l;
    }
  return v;
}
-- snip --

=>

  lgr     %r1,%r2
  lgr     %r3,%r2
  rxsbg   %r1,%r2,0,62,1   # (shift r2 by one bit left and xor with r1)
  rxsbg   %r3,%r1,0,62,1
  lgr     %r1,%r2
  rxsbg   %r1,%r3,0,62,1
  lgr     %r4,%r1          <----- unnecessary
  lgr     %r1,%r2
  rxsbg   %r1,%r4,0,62,1
  lgr     %r5,%r1          <----- unnecessary
  lgr     %r1,%r2
  rxsbg   %r1,%r5,0,62,1
  lgr     %r0,%r1          <----- unnecessary
  lgr     %r1,%r2
  rxsbg   %r1,%r0,0,62,1
  rxsbg   %r2,%r1,0,62,1
  br      %r14

("%r1,%r2,0,62,1" means "r1 := r1 ^ (r2 << 1)"; the ",0,62,1" part of the
instruction effectively means "shift left by one".)

(gets worse with more loop passes).  The code got unrolled in tree:

  v_16 = l_4(D) << 1;
  v_17 = l_4(D) ^ v_16;
  v_21 = v_17 << 1;
  v_22 = l_4(D) ^ v_21;
  v_26 = v_22 << 1;
  v_27 = l_4(D) ^ v_26;
  v_31 = v_27 << 1;
  v_32 = l_4(D) ^ v_31;
  v_36 = v_32 << 1;
  v_37 = l_4(D) ^ v_36;
  v_41 = v_37 << 1;
  v_42 = l_4(D) ^ v_41;
  v_3 = v_42 << 1;
  v_5 = v_3 ^ l_4(D);
  return v_5;

Register allocation insists on having the value of "l" in r1.  As the result of
the previous pass through the loop is in r1, it's necessary to move that value
out of the way first.  Later on, regrename fails to clean up this situation,
probaby because the problem is too complex with many sequential overlapping
register use chains.

This:

  lgr     %r1,%r2
  rxsbg   %r1,%r3,0,62,1
  lgr     %r4,%r1
  lgr     %r1,%r2
  rxsbg   %r1,%r4,0,62,1
  lgr     %r5,%r1
  lgr     %r1,%r2
  rxsbg   %r1,%r5,0,62,1
  lgr     %r0,%r1
  lgr     %r1,%r2
  rxsbg   %r1,%r0,0,62,1

could be rewritten to

  lgr     %r1,%r2
  rxsbg   %r1,%r3,0,62,1
  lgr     %r3,%r2
  rxsbg   %r3,%r1,0,62,1
  lgr     %r1,%r2
  rxsbg   %r1,%r3,0,62,1
  lgr     %r3,%r2
  rxsbg   %r3,%r1,0,62,1

just using three registers.

The question is whether this situation can be improved, either in the register
allocator, or regrename, or in the pattern.