Bug 70236 - Register allocation and loop unrolling lead to waste of registers
Summary: Register allocation and loop unrolling lead to waste of registers
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks:
 
Reported: 2016-03-15 10:40 UTC by Dominik Vogt
Modified: 2017-02-03 12:38 UTC (History)
2 users (show)

See Also:
Host:
Target: s390x-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
ira dump (2.48 KB, text/plain)
2016-03-15 10:40 UTC, Dominik Vogt
Details
rnreg dump (2.10 KB, text/plain)
2016-03-15 10:40 UTC, Dominik Vogt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dominik Vogt 2016-03-15 10:40:11 UTC
Created attachment 37966 [details]
ira dump

A new s390x pattern for shift-and-xor does not yield a satisfying result with this code when compiled with "-O3 -funroll-loops":

-- snip --
unsigned long hash(unsigned long l)
{
  unsigned long v = 0;
  unsigned long i;

  for (i = 0; i < 8; i++)
    {
      v <<= 1;
      v ^= l;
    }
  return v;
}
-- snip --

=>

  lgr     %r1,%r2
  lgr     %r3,%r2
  rxsbg   %r1,%r2,0,62,1   # (shift r2 by one bit left and xor with r1)
  rxsbg   %r3,%r1,0,62,1
  lgr     %r1,%r2
  rxsbg   %r1,%r3,0,62,1
  lgr     %r4,%r1          <----- unnecessary
  lgr     %r1,%r2
  rxsbg   %r1,%r4,0,62,1
  lgr     %r5,%r1          <----- unnecessary
  lgr     %r1,%r2
  rxsbg   %r1,%r5,0,62,1
  lgr     %r0,%r1          <----- unnecessary
  lgr     %r1,%r2
  rxsbg   %r1,%r0,0,62,1
  rxsbg   %r2,%r1,0,62,1
  br      %r14

("%r1,%r2,0,62,1" means "r1 := r1 ^ (r2 << 1)"; the ",0,62,1" part of the instruction effectively means "shift left by one".)

(gets worse with more loop passes).  The code got unrolled in tree:

  v_16 = l_4(D) << 1;
  v_17 = l_4(D) ^ v_16;
  v_21 = v_17 << 1;
  v_22 = l_4(D) ^ v_21;
  v_26 = v_22 << 1;
  v_27 = l_4(D) ^ v_26;
  v_31 = v_27 << 1;
  v_32 = l_4(D) ^ v_31;
  v_36 = v_32 << 1;
  v_37 = l_4(D) ^ v_36;
  v_41 = v_37 << 1;
  v_42 = l_4(D) ^ v_41;
  v_3 = v_42 << 1;
  v_5 = v_3 ^ l_4(D);
  return v_5;

Register allocation insists on having the value of "l" in r1.  As the result of the previous pass through the loop is in r1, it's necessary to move that value out of the way first.  Later on, regrename fails to clean up this situation, probaby because the problem is too complex with many sequential overlapping register use chains.

This:

  lgr     %r1,%r2
  rxsbg   %r1,%r3,0,62,1
  lgr     %r4,%r1
  lgr     %r1,%r2
  rxsbg   %r1,%r4,0,62,1
  lgr     %r5,%r1
  lgr     %r1,%r2
  rxsbg   %r1,%r5,0,62,1
  lgr     %r0,%r1
  lgr     %r1,%r2
  rxsbg   %r1,%r0,0,62,1

could be rewritten to

  lgr     %r1,%r2
  rxsbg   %r1,%r3,0,62,1
  lgr     %r3,%r2
  rxsbg   %r3,%r1,0,62,1
  lgr     %r1,%r2
  rxsbg   %r1,%r3,0,62,1
  lgr     %r3,%r2
  rxsbg   %r3,%r1,0,62,1

just using three registers.

The question is whether this situation can be improved, either in the register allocator, or regrename, or in the pattern.
Comment 1 Dominik Vogt 2016-03-15 10:40:54 UTC
Created attachment 37967 [details]
rnreg dump