[Bug tree-optimization/71437] [7 regression] Performance regression after r235817

Mon Jan 16 10:28:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71437

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
                 CC|                            |amker at gcc dot gnu.org
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
With -fwhole-program there's no regression from GCC 6.2 to current trunk. 
Without I still can see a small regression (here 0.86s vs 0.92s).

>From looking at the assembly it's hard to tell what the issue is.  perf shows
hot spots at mispredicted branches it seems (for both good and bad case).

In .optimized I see that IVO with different choices for trunk with the input
into IVO being more or less the same.  Trunk ends up with

  <bb 6> [92.50%]:
  # i_138 = PHI <i_128(10), 0(5)>
  # ivtmp.78_378 = PHI <ivtmp.78_377(10), ivtmp.78_376(5)>
  _5 = (const int *) ivtmp.78_378;
  _366 = (void *) ivtmp.78_378;
  _6 = MEM[base: _366, offset: 0B];
  if (_6 > L.2_7)
    goto <bb 7>; [50.00%]
  else
    goto <bb 9>; [50.00%]

  <bb 7> [46.25%]:
  _370 = (unsigned int) i_138;
  _369 = _370 * 4;
  _10 = _369;
  _368 = ivtmp.78_378 + 4294967292;
  _367 = (const int *) _368;
  _11 = _367;
  _374 = (void *) ivtmp.78_378;
  _12 = MEM[base: _374, offset: 4294967292B];
  if (L.2_7 >= _12)
    goto <bb 8>; [7.50%]
  else
    goto <bb 9>; [92.50%]

  <bb 9> [89.03%]:
  i_128 = i_138 + 1;
  ivtmp.78_377 = ivtmp.78_378 + 4;
  if (i_128 != _371)
    goto <bb 10>; [92.50%]
  else
    goto <bb 11>; [7.50%]

  <bb 10> [82.35%]:
  goto <bb 6>; [100.00%]

while GCC 6 did

  <bb 8>:
  # i_153 = PHI <0(7), i_19(12)>
  _572 = (sizetype) i_153;
  _17 = MEM[base: pretmp_509, index: _572, step: 4, offset: 4B];
  if (_17 > pretmp_506)
    goto <bb 9>;
  else
    goto <bb 11>;

  <bb 9>:
  _591 = (sizetype) i_153;
  _22 = MEM[base: pretmp_509, index: _591, step: 4, offset: 0B];
  if (_22 <= pretmp_506)
    goto <bb 10>;
  else
    goto <bb 11>;

  <bb 11>:
  i_19 = i_153 + 1;
  if (i_19 != _573)
    goto <bb 12>;
  else
    goto <bb 13>;

  <bb 12>:
  goto <bb 8>;

but not sure if that ends up slower.  GCC 6.2 asm:

.L23:
        movl    %edx, %eax
.L27:
        movl    4(%esi,%eax,4), %ecx
        cmpl    %ebx, %ecx
        jle     .L11
        movl    (%esi,%eax,4), %ebp
        cmpl    %ebx, %ebp
        jle     .L34
.L11:
        leal    1(%eax), %edx
        cmpl    %edi, %edx
        jne     .L23

GCC 7:

.L23:
        movl    %edx, %ecx
.L13:
        cmpl    %esi, (%eax)
        movl    %eax, %ebx
        jle     .L11
        cmpl    -4(%eax), %esi
        leal    0(,%ecx,4), %edx
        leal    -4(%eax), %ebp
        jge     .L30
.L11:
        leal    1(%ecx), %edx
        addl    $4, %eax
        cmpl    %edi, %edx
        jne     .L23

at least this is the most notable difference in the innermost loops on GIMPLE
(plenty of differences in the outer loop stuff).

Bin, any idea why IVO does the "bad" choice here?