[Bug tree-optimization/71437] [7 regression] Performance regression after r235817
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Jan 16 10:28:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71437
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEW
CC| |amker at gcc dot gnu.org
Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
With -fwhole-program there's no regression from GCC 6.2 to current trunk.
Without I still can see a small regression (here 0.86s vs 0.92s).
>From looking at the assembly it's hard to tell what the issue is. perf shows
hot spots at mispredicted branches it seems (for both good and bad case).
In .optimized I see that IVO with different choices for trunk with the input
into IVO being more or less the same. Trunk ends up with
<bb 6> [92.50%]:
# i_138 = PHI <i_128(10), 0(5)>
# ivtmp.78_378 = PHI <ivtmp.78_377(10), ivtmp.78_376(5)>
_5 = (const int *) ivtmp.78_378;
_366 = (void *) ivtmp.78_378;
_6 = MEM[base: _366, offset: 0B];
if (_6 > L.2_7)
goto <bb 7>; [50.00%]
else
goto <bb 9>; [50.00%]
<bb 7> [46.25%]:
_370 = (unsigned int) i_138;
_369 = _370 * 4;
_10 = _369;
_368 = ivtmp.78_378 + 4294967292;
_367 = (const int *) _368;
_11 = _367;
_374 = (void *) ivtmp.78_378;
_12 = MEM[base: _374, offset: 4294967292B];
if (L.2_7 >= _12)
goto <bb 8>; [7.50%]
else
goto <bb 9>; [92.50%]
<bb 9> [89.03%]:
i_128 = i_138 + 1;
ivtmp.78_377 = ivtmp.78_378 + 4;
if (i_128 != _371)
goto <bb 10>; [92.50%]
else
goto <bb 11>; [7.50%]
<bb 10> [82.35%]:
goto <bb 6>; [100.00%]
while GCC 6 did
<bb 8>:
# i_153 = PHI <0(7), i_19(12)>
_572 = (sizetype) i_153;
_17 = MEM[base: pretmp_509, index: _572, step: 4, offset: 4B];
if (_17 > pretmp_506)
goto <bb 9>;
else
goto <bb 11>;
<bb 9>:
_591 = (sizetype) i_153;
_22 = MEM[base: pretmp_509, index: _591, step: 4, offset: 0B];
if (_22 <= pretmp_506)
goto <bb 10>;
else
goto <bb 11>;
<bb 11>:
i_19 = i_153 + 1;
if (i_19 != _573)
goto <bb 12>;
else
goto <bb 13>;
<bb 12>:
goto <bb 8>;
but not sure if that ends up slower. GCC 6.2 asm:
.L23:
movl %edx, %eax
.L27:
movl 4(%esi,%eax,4), %ecx
cmpl %ebx, %ecx
jle .L11
movl (%esi,%eax,4), %ebp
cmpl %ebx, %ebp
jle .L34
.L11:
leal 1(%eax), %edx
cmpl %edi, %edx
jne .L23
GCC 7:
.L23:
movl %edx, %ecx
.L13:
cmpl %esi, (%eax)
movl %eax, %ebx
jle .L11
cmpl -4(%eax), %esi
leal 0(,%ecx,4), %edx
leal -4(%eax), %ebp
jge .L30
.L11:
leal 1(%ecx), %edx
addl $4, %eax
cmpl %edi, %edx
jne .L23
at least this is the most notable difference in the innermost loops on GIMPLE
(plenty of differences in the outer loop stuff).
Bin, any idea why IVO does the "bad" choice here?
More information about the Gcc-bugs
mailing list