[Bug tree-optimization/85275] New: copyheader peels off almost the entire iteration

Sat Apr 7 08:06:00 GMT 2018

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85275

            Bug ID: 85275
           Summary: copyheader peels off almost the entire iteration
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

I expected predcom to eliminate one of the loads in this loop at -O3:

int is_sorted(int *a, int n)
{
  for (int i = 0; i < n - 1; i++)
    if (a[i] > a[i + 1])
      return 0;
  return 1;
}

Unfortunately, predcom bails out since the loads it sees are not
always-executed. Ideally loop header copying would make this a suitable
do-while loop, but in this case it duplicates too much:

;; Loop 1
;;  header 5, latch 4
;;  depth 1, outer 0
;;  nodes: 5 4 3
;; 2 succs { 5 }
;; 3 succs { 6 4 }
;; 4 succs { 5 }
;; 5 succs { 3 6 }
;; 6 succs { 1 }
Analyzing loop 1
Loop 1 is not do-while loop: latch is not empty.
    Will duplicate bb 5
    Will duplicate bb 3
  Not duplicating bb 4: it is single succ.
Duplicating header of the loop 1 up to edge 3->4, 12 insns.
[...]
  <bb 2> [local count: 114863532]:
  _17 = n_12(D) + -1;
  if (_17 > 0)
    goto <bb 3>; [94.50%]
  else
    goto <bb 6>; [5.50%]

  <bb 3> [local count: 108546038]:
  _18 = 0;
  _19 = _18 * 4;
  _20 = a_13(D) + _19;
  _21 = *_20;
  _22 = _18 + 1;
  _23 = _22 * 4;
  _24 = a_13(D) + _23;
  _25 = *_24;
  if (_21 > _25)
    goto <bb 6>; [5.50%]
  else
    goto <bb 5>; [94.50%]

  <bb 4> [local count: 906139986]:
  _1 = (long unsigned int) i_15;
  _2 = _1 * 4;
  _3 = a_13(D) + _2;
  _4 = *_3;
  _5 = _1 + 1;
  _6 = _5 * 4;
  _7 = a_13(D) + _6;
  _8 = *_7;
  if (_4 > _8)
    goto <bb 6>; [5.50%]
  else
    goto <bb 5>; [94.50%]

  <bb 5> [local count: 958878293]:
  # i_26 = PHI <0(3), i_15(4)>
  i_15 = i_26 + 1;
  _9 = n_12(D) + -1;
  if (_9 > i_15)
    goto <bb 4>; [94.50%]
  else
    goto <bb 6>; [5.50%]

(throttling it down with --param max-loop-header-insns=5 gives the expected
optimization)