This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/42720] New: Empty loop generated at unswitch-loops with -O2 -fprofile-use


The bug is triggered with -O2 -fprofile-use.

test case, loop.cpp:

int fun_b(int hbs[], int num, void *obj) {
 int i;
 int s = 0;
 for (i = 0; i < num; i++) {
   if (obj != 0) {
     if ((int)obj - hbs[i] > 0) {
       s += hbs[i];
     }
   }
 }
 return s;
}

int main () {
 int i;
 int s = 0;
 int hbs[100];
 for (i = 0; i < 100; ++i) {
   hbs[i] = i * 2000 + 100000;
 }
 for (i = 0; i < 20; ++i) {
   s += fun_b (hbs, 100, &hbs[i]);
 }
 return s;
}

Profile the program. Apparently the loop inside fun_b() is hot.

$arm-eabi-g++ loop.cpp -O2 -fprofile-use --save-temps -c -o loop.o

We we see an empty loop (.L5) if obj==0, in function fun_b.

_Z5fun_bPiiPv:
       @ args = 0, pretend = 0, frame = 0
       @ frame_needed = 0, uses_anonymous_args = 0
       @ link register save eliminated.
       cmp     r1, #0
       stmfd   sp!, {r4, r5}
       mov     r3, r0
       ble     .L57
       cmp     r2, #0   <--- "if (obj != 0)" is moved out of loop
       beq     .L5
     ....
.L3:
       ldmfd   sp!, {r4, r5}
       bx      lr
.L5:                     ;;  if (obj == 0), empty loop
       add     r2, r2, #1    ;;
       cmp     r2, r1        ;;
       bne     .L5           ;;
.L57:
       mov     r0, #0
       b       .L3

The empty loop (.L5) should have been eliminated. I have tested -O2 without
-fprofile-use, where the empty loop is gone.

I find that the root cause of the inefficiency of -O2 FDO is that during
unswitch-loops, the simplification of loop conditions is missed when FDO is on. 

Let's say,
Version A: "-O2 -funswitch-loops", which does right thing.
Version B: "-O2 -fprofile-use". Version B generates an empty loop which should
be eliminated.

Before switch-loop pass, the loop (inner-most, hot) is

 loop {
    if (obj != 0) {
      ...
    }
  }

Both version A and version B perform one pass of unswitch-loop on this loop
body.
In function tree_unswitch_single_loop(),
after "nloop = tree_unswitch_loop (loop, bbs[i], cond)", the loop becomes

if (obj != 0) {
 loop {               <---- original copy of the loop
   if (obj != 0) {
     ...
   }
 }
} else {
 loop {              <----- "nloop": a new copy of the loop
    if (obj != 0) {
      ...
    }
  }
}

Then, right before the end of tree_unswitch_single_loop(), gcc recursively
calls itself on modified loops.
  tree_unswitch_single_loop (nloop, num + 1);

>From here, Version A and Version B starts to perform differently.

For Version A ("-O2 -funswitch-loops"), gcc conditions looking for
unswitch-loop opportunity in the new loop "nloop".
It finds that the condition of the new loop can be simplified. Since obj is 0
when it comes to the new loop, gcc
replaces obj by 0. Thus the loop becomes

if (obj != 0) {
 loop {               <---- original copy of the loop
   if (obj != 0) {
     ...
   }
 }
} else {
 loop {                   <----- "nloop": a new copy of the loop
    if (0 != 0) {     <--- obj is replaced by "0"
      ...
    }
  }
}
Therefore, in the TODO pass cleanup-cfg, the "nloop" is entirely removed.

However, for Version B ("-O2 -fprofile-use"), gcc finds that the "nloop" is a
cold loop, so it returns immediately, without checking if the condition can be
simplified. Thus nloop is not cleaned up by the following cleanup-cfg pass and
results in an empty loop.

The problematic code in is unswitch_single_loop() in loop-unswitch.c.

static void
unswitch_single_loop(struct loop *loop, ...)
{ ...
  /* Do not unswitch in cold areas.  */
  if (optimize_loop_for_size_p (loop))
    {
       dump
       return;
     }
  ...
  do
    {   ...
       /* Check whether the result can be predicted.  */
       for (acond = cond_checked; acond; acond = XEXP (acond, 1))
           simplify_using_condition (XEXP (acond, 0), &cond, NULL);
          ...
      } while (repeat);
    ...
  /* Unswitch the loop on this condition.  */
  nloop = unswitch_loop (loop, bbs[i], cond, cinsn);
  ...

  /* Invoke itself on modified loops.  */
  unswitch_single_loop (nloop, rconds, num + 1);
  unswitch_single_loop (loop, conds, num + 1);
  ...
}

To fix the empty loop problem, my thought is to propagate the conditions
immediately after nloop is inserted.

Any suggestion?

Thanks,
Jing


-- 
           Summary: Empty loop generated at unswitch-loops with -O2 -
                    fprofile-use
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jingyu at google dot com
 GCC build triplet: X86_64-linux-gnu
  GCC host triplet: X86_64-linux-gnu
GCC target triplet: arm-unknown-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42720


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]