This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: CSE pass prevents loop-invariant motion


On 09/15/2015 06:11 AM, Robin Dapp wrote:
Hi,

recently, I came across a problem that keeps a load instruction in a
loop although it is loop-invariant.

A simple example is:

#include <stdio.h>

#define SZ 256
int a[SZ], b[SZ], c[SZ];

int main() {

   int i;
   for (i = 0; i < SZ; i++) {
     a[i] = b[i] + c[i];
   }

   printf("%d\n", a[0]);
}

The resulting loop code in s390 assembly is:
(built with -march=z196 -O3)

.L2:
   l     %r3, 0(%r1, %r5)  # load b[i]
   a     %r3, 0(%r1, %r4)  # add c[i]
   st    %r3, 0(%r1, %r12) # store a[i]
   larl  %r3, a            # <-- load a, why?
   aghi  %r1, 4            # incr i
   brctg %r2, .L2          # branch

followed by the printf call:

   lgf   %r3, 0(%r3)       # load a[0]
   larl  %r2, .LC1         # load "%d\n"
   brasl %r14, printf      # call printf

The cse1 pass detects that "larl %r3, a" and "lgf %r3, 0(%r3)"
can use the same internal register and unifies them.

loop2_invariant then realizes that "larl  %r3, a" is loop-invariant and
moves it out of the loop leaving a register-register move in the loop
that should be removed in some subsequent pass.
However, cprop3 replaces the source register of this move by the
constant symbolref "a" again, the moved "larl  %r3, a" is removed by
cse2 directly afterwards and we are back to square one...
(I could also post the corresponding RTL if requested)

The problem begins, imho, when cse1 extends the BB of the loop to the BB
after the loop and uses the same register for the load in the loop as
well as after the loop. This prevents PRE, cprop and loop2_invariant
from correctly moving the load outside of the loop.

I wonder if this behavior is intended in general or if I'm missing
something? On i386, my example works because the load of "a" before
printf is handled differently.

I wrote a patch that disables basic block extension out of loops in the
first CSE pass for s390 which fixes the problem for me. It uses a target
hook and should only affect s390 but is not yet thoroughly tested.

Any opinions/remarks on this?
You might want to check your costing model -- cprop is supposed to look at costs to help determine when propagations are useful.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]